blog.bouah.net/content/posts/thread-model.en.md
Maxime “pep” Buquet de3c6fad73
threat-model: update
Signed-off-by: Maxime “pep” Buquet <pep@bouah.net>
2022-04-14 13:03:20 +02:00

9.3 KiB

title date tags
An overview of my threat model 2022-04-13T12:00:00+01:00
XMPP
Threat Model
Security
Privacy

I was interested in knowing what kind of threat model people had when using XMPP, so I asked on the newly created XMPP-related community forum -- which uses Lemmy! A decentralized alternative to Reddit using Activity Pub. I had an idea for myself, but I didn't realize it was going to be this long an answer. So I decided to write it down here instead. I'll be posting the link there.

Building up a threat model is identifying what and/or whom you are trying to protect against. This allows you to take steps to ensure you are actually being protected against what you think you want to protect against. A threat model is to be refined, improved, etc.

I have two main use-cases and I'll go through one of them, the other one being less involved, even though definitely influenced by this one. This is surely incomplete but it should give a pretty good overview still.

I started doing some activism the past years and I've had to adapt regarding communications. It seems not many people in these groups are aware of the amount of information that's recoverable by an attacker. I was surprised how very little security culture there was, even though I wasn't doing much of it myself before (because I didn't think I needed it, really). As you may have guessed, this concerns a lot more than just instant messaging but this is what this article focuses on.

The threat model

For this use-case, I want it to make it hard for anybody to trace my actions back to my civil identity and those of my friends. While I know this is never going to be perfect, and the attacker here has way more resources than we have, we do what is possible to reduce the impact on us. I am also aware that many attacks are theoretical and may be used nowhere in practice, but that doesn't mean we should ignore them either.

Online, I want to protect myself against passive state-level surveillance, but also targeted surveillance to some extent. Offline, I need to protect the devices I use. In case they are seized by the police, I want to prevent them from getting too much information so they get less material to charge us with. But if it gets to this, there's many chances they are going to be able to associate my different identities.

Some may think with this threat model in mind I wouldn't trust the server administrator, but this is a false dichotomy. What I don't want is my data falling in the hands of an intruder such as the police overtaking the server. Server admins are legally required to give encryption passphrases in many jurisdictions, for one, but also mistakes are human and hacking into a server may not be so hard with the right amount of resources.

How does this work with XMPP?

First, this is not proper to XMPP: we don't use our civil identities, we use pseudonyms. In these circles we mostly don't know each other's civil identities, and it's not useful anyway. It's the same online for example in the free software community, where there's no reason why you'd need this information.

We use Tor, so the ISP and middle boxes don't know where we connect to, and the XMPP server doesn't know where we connect from.

We create accounts on populated public XMPP servers, and connect to them using TLS -- which has been the default for a long time now -- and use member-only / private (non-public) rooms to talk together, with OMEMO. We don't know all of the people in the room but there is some kind of trust chain.

We're not verifying OMEMO fingerprints as we may not know everybody in the room, and changing devices/OMEMO keys also causes pain regarding user experience when combined with FP verification.

On devices (PCs, smartphones), we use full-disk encryption where possible. As we generally use second-hand phones, the feature may not be available all the time. A pretty generic advice I give is to put a passphrase to the OS and also clear client logs regularly. It can be configured in Conversations on Android, I don't know about iOS clients.

The baseline is: your smartphone is your weak point, even though most of us have one because it's convenient. This is certainly the first piece that will incriminate you, if it's not you or your friends doing so inadvertently.

What I'd like to improve in XMPP?

There are so many details that I have no clue about that could be used against me to correlate my different identities.

I use multiple accounts on Conversations, as well as Dino on the desktop for this use-case. Randomizing connections to the various accounts could be one thing to improve.

I don't use Poezio for anything else than my civil identity, because Poezio isn't very much used. Even though it may also be the case for Dino..

Currently in server logs, a few things can be used to identify a client, such as the resource string set by the client to something similar to clientname.randombits, or the disco#info which lists capabilities of a client. Both are actually stored on the server for possibly good reasons, but that's always more information to identity somebody.

I remember developers asking for the resource to be easily distinguishable for debugging purposes. Having something à la docker container names should be good enough for this (a list of adjectives and names combined into random <adjective>_<name>). I am not entirely sure what to do about disco#info being stored.

A good point for public servers is that they don't seem to store archives forever anymore (since GDPR? Or for disk-space concerns maybe). They will generally have 2 weeks / 1 month of (encrypted) activity which, I give you, may be enough in some cases to incriminate someone, but it's probably better than logs that go back to -infinity.

The roster is also stored as plaintext on the server and can easily be taken by the police. Encrypted roster may not be as far as we imagine. There have been similar efforts done in Dovecot to encrypt the user mailbox with a user-provided passphrase. This wouldn't prevent servers from recreating it based on activity when logged in, but that's already more efforts required and many wouldn't bother -- leaving this data unavailable as plaintext by default.

On the client, I would like more private defaults. Tor support is a MUST, fortunately Conversations has it, and it's possible to use it with Dino but one has to know how to set it up on their system and there's no way to enforce using Tor, and it's not shown whether it's in use either. Same issue in Poezio.

Storing logs forever is also one thing that I find annoying. It can be configured in Conversations but it's not limited by default. It's hidden in Expert Setting as Never to delete messages automatically.

Dino doesn't have any settings regarding logs. I'd have to clear them myself by going through the sqlite database (pretty technical already). Poezio has a use_log setting that stores every message (and presence depending on config), and it's also True by default.

Interactions with OMEMO between non-contacts is a mess. Some servers have the mod_block_strangers module deployed as an anti-spam measure: when a user from such a server joins a private room, non-contacts will be prevented from fetching their keys. Dino creates the OMEMO node as only accessible by contacts (to prevent deanonymization in some Prosody MUCs). And Conversations doesn't allow sending encrypted messages if it doesn't have keys of all participants in a private room.

I am not even talking about OMEMO implementations (using OMEMO 0.3.0) which per the spec only encrypt the <body/> element in a message, leaking actual data depending on the feature used, or restricting the feature set greatly. This is fixed in the newer version of the spec but deployed nowhere at the moment.

I am also not talking about why XMPP and not say Signal, or Telegram. I have already talked about this in part in other articles but that may warrant its own article at some point.

This article only scratches the surface. There are many more details that would need to be ironed-out. And of course implementations need to make choices and can't answer every single use-cases out there. I do wish Privacy was more of a concern though.

Where is “Privacy by default” gone? Somebody bring it back please.