Making MLS more decentralized

Making MLS more decentralized

It’s no secret that we at Phoenix R&D are big fans of the Messaging Layer Security (MLS) protocol, having helped it to come into existence. It’s a versatile group key agreement and messaging protocol that’s used to power both asynchronous and real-time applications.

MLS is relatively easy to deploy in an environment where a server ensures MLS handshake messages are created and delivered in the right order. In the absence of such an arbiter, conflicts between clients are inevitable and must be resolved later. This can have implications for Forward Secrecy, a security guarantee that MLS aims to provide.

In this blog post, we look at why that’s the case and propose DMLS: a specification and implementation – based on work by Alwen et al. – to solve the problem.

How MLS avoids distributed systems challenges

At its core, MLS is a continuous group key agreement protocol. Group members periodically update their own key material to facilitate agreement on shared key material within a group. These updates are necessary to provide two core security guarantees: Forward Secrecy and Post-Compromise Security. Both security guarantees provide protection in case the key material of a group member is compromised by an adversary. Forward Secrecy ensures the adversary cannot decrypt past messages, while Post-Compromise Security aims to protect messages after such a compromise. In MLS, key updates happen as part of a handshake message called commit. Each commit begins a new epoch for the group.

Unfortunately, commits aren’t commutative, so group members must agree on the order in which they’re applied to the group state. This is a fundamental challenge in distributed systems, which MLS avoids by requiring the existence of a Delivery Service (DS), which provides this guarantee (but is otherwise untrusted). Most messaging systems involve a central server that can fulfill this role by distributing commits on a first-come, first-served basis telling all senders of conflicting commits to retry in the next epoch.

The fact that commits are strictly ordered means that the history of a group state is simply a long chain of commits.

This requirement for a total order of commits is clearly problematic in systems that lack such a central entity.

A simple solution: allow group state forks

If there were no ordering of commits, clients could retain the old group state whenever they processed a commit. If two group members concurrently sent a commit in the same epoch, recipients would thus end up with two forks. Instead of a chain of commits, a group state’s history would turn into a directed acyclic graph (DAG), where nodes are epochs and edges are commits.

Epoch chain (left) and epoch DAG (right)

This leaves us with two problems:

  1. Group members must manage and resolve forks.
  2. Group members lose Forward Secrecy, because they can’t safely delete secret key material immediately.

The first problem is highly application-specific, which is why this post focuses on the second problem.

Group forking and Forward Secrecy

Section 14 of RFC 9420 describes the need to sequence commits and also mandates that the amount of time that forked or previous group states exist must be minimized to ensure Forward Secrecy.

In most scenarios, Forward Secrecy is achieved by deleting key material that’s no longer needed immediately after processing a commit. This ensures that an adversary who compromises a group member no longer has access to the secrets of the old epoch and as such can’t re-derive secrets of subsequent epochs. This allows the group member to achieve Forward Secrecy for encryption secrets by deleting them immediately after use.

However, if a group member wants to be able to process other commits for the same epoch later on, they must retain the epoch’s key material, thus weakening the Forward Secrecy of any child epoch.

A compromise in one epoch (left) can propagate down the epoch DAG (right)

Alwen, Mularczyk and Tselekounis [AMT] recognized this limitation and proposed a solution in their paper "Fork-Resilient Continuous Group Key Agreement". Their technique changes MLS to allow clients to puncture the retained shared key material when processing a commit. This puncturing prevents an adversary who obtains the key material from re-computing the key material of the new epoch, effectively preventing them from decrypting messages created in the new epoch. At the same time, the puncturing is specific to that particular commit. Other commits sent in the same epoch can still be processed and the key material can be punctured again to protect each new epoch created in this way.

Puncturing prevents the adversary from re-computing secrets of child epochs in the DAG

DMLS = MLS + Fork Resilience

Because AMT’s Fork Resilience approach is relatively simple and meaningfully improves Forward Secrecy in case of group state forks, we wrote a brief specification (in the shape of an IETF internet draft) and made the required changes to our implementation of MLS (OpenMLS). This involved three main steps:

  1. Implementing the Puncturable Pseudorandom Function (PPRF) primitive required to derive key material in a Forward Secret way.
  2. Integrate the PPRF into OpenMLS’s key schedule.
  3. Rig the OpenMLS storage provider to allow storing more than one epoch for the same group.

The way that OpenMLS is structured made step 2 simple enough. Steps 1 and 3 required slightly more thought.

Puncturable pseudorandom function (PPRF)

Just like a regular Pseudorandom Function (PRF), a  Puncturable PRF (PPRF) allows the caller to obtain a random-looking output from any input. The advantage of a PPRF is that each call punctures (partially deletes) the underlying key material, such that it can’t be evaluated on the same input again. MLS’s Secret Tree, for example, is such a PPRF. For DMLS, we had to implement a PPRF from scratch, because OpenMLS Secret Tree implementation isn’t optimized for the input size required for AMT’s Fork Resilience.

The PPRF construction in question is essentially a big binary tree with one leaf for each bit of output. For Fork Resilience, we need a PPRF with 256 output bits (32 bytes). This requires a tree of depth 256, where we have one leaf for each possible output. To compute an output, we simply walk down the tree from the root to the output’s leaf, going either left or right down the tree depending on the bit values of the input. At each node, we derive two secrets, one for each child of the node. The secret derived for the leaf of the tree is the output. The secrets on the direct path of that leaf are deleted. The other secrets (those on the co-path) are stored to allow the computation of other outputs after puncturing. The deletion of secrets on the direct path ensures that the same output can’t be derived again, allowing us to achieve Forward Secrecy.

Evaluating the PPRF on one input requires us to store at least 256 entries of 32 bytes (plus a few bytes of node index), yielding a little more than 8kB of data; the next requires 255 entries and so on. In the world of public-key post-quantum cryptography, that’s not a lot, but it can still significantly increase the storage size of a group.

OpenMLS storage provider

Functions of OpenMLS that change the state of a group typically take an abstract storage provider as input, allowing them to store or delete various parts of the group state. The API of the storage provider was designed under the assumption that there would only ever be one epoch worth of state for a given group. Since this is no longer the case with DMLS, we had to make some adjustments.

To keep the changes to OpenMLS as small as possible, we decided to create a new DMLS storage provider. The DMLS storage provider can copy the state of a group at a given epoch and spawn a regular storage provider to operate on just that state. This allows us to keep OpenMLS’s interactions with the storage provider intact while still supporting forks. All that remains after processing a commit with OpenMLS is to overwrite the PPRF state of the past epoch to ensure that it’s replaced by the punctured one, thus providing Forward Secrecy.

Overhead

Besides the limited increase in implementation complexity, storage is the main downside of the DMLS approach. The amount of overhead mostly depends on the following factors:

  • retention period for old group states
  • group size
  • ciphersuite of the group (large keys in post-quantum secure ciphersuites can further increase group size)
  • fork frequency (due to the puncturing overhead described above)

DMLS Groups

In summary, DMLS can help improve Forward Secrecy in systems where forks are inevitable. Despite its overhead in storage complexity, we believe it can act as a building block for decentralized systems that meaningfully improve security.

If you’re interested in experimenting with DMLS groups, check out our proof-of-concept implementation based on OpenMLS. If you’re interested in the theory behind Fork Resilience, you can find AMT’s paper here. Finally, if you have proposals on how to improve DMLS at the specification level, check out our draft and contact us directly or write to the MLS working group mailing list at the IETF.

Thanks

We would like to thank eQualitie, who made it possible for us to dedicate sufficient time to this project.


We'd love to hear from you! Whether you're interested in experimenting with DMLS, have questions about the implementation, or see potential applications we haven't considered – reach out at hello@phnx.im.