Bitwarden Research 3: Data Protection

As part of our research into Bitwarden server hashes, we wrote code to extract them from a database. In order to fully implement that in Python, we had to reimplement a subset of ASP.NET Core Data Protection. Standalone Python code is on GitHub.
Caution: This post is cryptography-heavy. Feel free to skip ahead to the final post.
Data Protection is an encryption scheme that can be used with a wide range of customization, from zero configuration to replacing most of the stack with custom code. Of course, most usages of it, including Bitwarden Server, go with the zero configuration option. In the standard usage, the scheme can be thought of as an opaque encryption/decryption oracle using a key that is stored on disk. Obviously, if the encrypted data is stored on the same disk as the key, then the scheme can be trivially broken by an attacker with disk access—which is our scenario. In spite of saying “trivially”, however, reimplementing the cryptographic code in Python was more difficult than expected. There are quite a few cryptographic operations and a bunch of complexity under the hood.
We’ll walk through our process roughly how we accomplished it. First, we’ll look at how Bitwarden uses Data Protection. Then, we’ll introduce our own standalone debug C# implementation. After that, we’ll dive into the cryptography of Data Protection. Finally, we’ll talk about some of the interesting pieces of our code.
Bitwarden’s code
Here’s how the Bitwarden server uses Data Protection for Master Passwords:
Step 1: Add Data Protection to “services” (a C# thing) with the application name (explained below) “Bitwarden”. [code]
    public static void AddCustomDataProtectionServices(
        this IServiceCollection services, IWebHostEnvironment env, GlobalSettings globalSettings)
    {
        var builder = services.AddDataProtection().SetApplicationName("Bitwarden");
        if (env.IsDevelopment())
        {
            return;
        }
        if (globalSettings.SelfHosted && CoreHelpers.SettingHasValue(globalSettings.DataProtection.Directory))
        {
            builder.PersistKeysToFileSystem(new DirectoryInfo(globalSettings.DataProtection.Directory));
        }
Step 2: Initialize Protector instance with the purpose (explained below) “DatabaseFieldProtection”. [code]
        _dataProtector = dataProtectionProvider.CreateProtector(Constants.DatabaseFieldProtectorPurpose);
Step 3: Decrypt database fields with Unprotect(ciphertext). Protected database fields are actually stored as P|ciphertext, so remove that prefix first. [code]
        user.MasterPassword = _dataProtector.Unprotect(
                user.MasterPassword.Substring(Constants.DatabaseFieldProtectedPrefix.Length));
So what are application names and purposes? One goal of the Data Protection API is that it can be used with the same key material to encrypt and decrypt different silos of data in such a way that the Data Protection instance used for data in one silo can’t be used for data in another silo. In other words, you can initialize Data Protection with the same key but two different purposes, and the instance initialized with one purpose wouldn’t be able to decrypt data encrypted with the other instance. Purposes aren’t intended to be secret, so this doesn’t add any additional security if you only have one instance, but it is intended to prevent various attacks. This is one of the many ways in which Data Protection is implemented in a complex way to have a variety of kind-of-nice properties.
Summarizing Bitwarden’s implementation, it only does two custom things:
- Sets the application name to “Bitwarden”
- Sets the purpose to “DatabaseFieldProtection”
Debug implementation
In order for us to verify our assumptions and more easily debug our code, we wrote a simple standalone C# program that just protects and unprotects data. program.cs:
using Microsoft.AspNetCore.DataProtection;
using Microsoft.Extensions.DependencyInjection;
public class DataProtector
{
    public IDataProtector _protector;
    public DataProtector(IDataProtectionProvider provider, string purpose)
    {
        _protector = provider.CreateProtector(purpose);
    }
}
class Program
{
    static void Main(string[] args)
    {
        var purpose = "AVeryGoodPurpose";
        var applicationName = "exampleapplication";
        var serviceCollection = new ServiceCollection();
        serviceCollection.AddDataProtection().SetApplicationName(applicationName);
        var instance = ActivatorUtilities.CreateInstance<DataProtector>(serviceCollection.BuildServiceProvider(), purpose);
        var plaintext = "Hack the Gibson";
        var ciphertext = instance._protector.Protect(plaintext);
        Console.WriteLine($"'{plaintext}' encrypted to {ciphertext}");
        var roundtrip = instance._protector.Unprotect(ciphertext);
        Console.WriteLine($"and decrypted again to '{roundtrip}'");
    }
}
It also requires program.csproj:
<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <RootNamespace>csharp_dataprotection_example</RootNamespace>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="CommandLineParser" Version="2.9.1" />
    <PackageReference Include="Microsoft.AspNetCore.DataProtection" Version="8.0.6" />
    <PackageReference Include="Microsoft.Extensions.DependencyInjection" Version="8.0.0" />
  </ItemGroup>
</Project>
And usage looks like:
$ dotnet run
'Hack the Gibson' encrypted to CfDJ8DUjNeBN9mdMkzlc3iXuxRKuo5NaR-FzSN9CqpU78lAhGMBU_6HWtvMftapngynKpg4zZbSXSI4zm_Q7wKinx1Aa6Msq16EhfB7-sqJq8b2b7vLbZ1utFlfkCuqcYUD8ag
and decrypted again to 'Hack the Gibson'
But where is the encryption key stored? Data Protection offers multiple key storage implementations, but the default is PersistKeysToFileSystem, which is also what self-hosted Bitwarden servers use. On Linux, keys are XML files in ~/.aspnet/DataProtection-Keys/. In a Bitwaden server, the relative path is /bwdata/core/aspnet-dataprotection/key-<keyid>.xml, where <keyid> is a random GUID. If you protect a value without a valid key existing, Data Protection will silently create a new key for you.
The keys look like this:
<?xml version="1.0" encoding="utf-8"?>
<key id="e25c533d-64a7-4ae5-8225-edb0607d1006" version="1">
  <creationDate>2024-07-05T14:24:36.5826347Z</creationDate>
  <activationDate>2024-07-05T14:24:36.5738301Z</activationDate>
  <expirationDate>2024-10-03T14:24:36.5738301Z</expirationDate>
  <descriptor deserializerType="Microsoft.AspNetCore.DataProtection.AuthenticatedEncryption.ConfigurationModel.AuthenticatedEncryptorDescriptorDeserializer, Microsoft.AspNetCore.DataProtection, Version=8.0.0.0, Culture=neutral, PublicKeyToken=adb9793829ddae60">
    <descriptor>
      <encryption algorithm="AES_256_CBC" />
      <validation algorithm="HMACSHA256" />
      <masterKey p4:requiresEncryption="true" xmlns:p4="http://schemas.asp.net/2015/03/dataProtection">
        <value>wL95SczfGfgdODndLgWHQSTCpnrwyGRFbek2uyUE7mUYigOkC9O5zW1sfCW0JDxeJ/JynVTDoXEdBLp+wwdBXw==</value>
      </masterKey>
    </descriptor>
  </descriptor>
</key>
Note that a key also specifies a specific encryption and hash algorithm.
Reimplementation
With the background and usage out of the way, let’s dive into re-implementing Unprotect in Python. There are two main resources: the source code and the documentation (alternative documentation host).
When I found the documentation with diagrams and hexdump examples, I thought “oh, this will be easy”. Unfortunately, even though the docs have plenty of detail in some parts, they also hand-wave other parts. In addition, not being an Enterprise Developer, the source code was nearly impenetrable for me. Running our example through a debugger and glancing at the memory dump made it much clearer.
With default configuration, a protected payload contains:
- [ 4 bytes ] magic header: “\x09\xf0\xc9\xf0”
- [ 16 bytes ] key id (little endian UUID representation)
- [ 16 bytes ] “key modifier” (random data)
- [ 16 bytes ] IV
- [ variable ] ciphertext
- [ 32 bytes ] HMAC digest
This might seem straightforward enough, but it’s not! Before you can decrypt the ciphertext, you have to calculate a derived key. The derived keys for the encryption and the final HMAC are the first and second halves of the output of NIST SP800-108 KDF (called KBKDFHMAC in Python’s Cryptography) with a bunch of very specific parameters:
- key derivation key: the main data protection key
- PRF: HMACSHA512
- “label”: additional authenticated data (AAD)… which requires another calculation, explained after.
- “context”: the key modifier appended to a “context header” … which requires yet another calculation, explained after.
- output size: dependent on the key’s encryption algorithm key size and the hmac algorithm block size. With AES-256 and HMACSHA256, it would be 32 + 32 = 64 bytes
- The Python library required some other parameters as well, but I was able to copy the values from the example in the documentation and it worked.
Let’s talk about the context header next, because it was easy, even though it had a bunch of parts. First we’ll define two constant temporary keys, and then we’ll define the actual context header.
The keys are once again encryption and HMAC keys split out from the digest of a NIST SP800-108 KDF, except this time, it has all empty inputs.
The context header consists of:
- [ 2 bytes ] two null bytes
- [ 4 bytes ] encryption algorithm key size, little endian
- [ 4 bytes ] encryption algorithm block size, little endian
- [ 4 bytes ] HMAC key size, little endian
- [ 4 bytes ] HMAC block size, little endian
- [ variable ] block of output of the encryption algorithm using the temporary context header key, a block of null bytes as the IV, and an empty string padded with PKCS7 to the block size as the plaintext
- [ variable ] HMAC digest of an empty string using the temporary context header key
The final part is the AAD. This was not sufficiently explained in the documentation, and the majority of development time was spent on trying to get this right. The AAD header is:
- [ 4 bytes ] “\x09\xf0\xc9\xf0”
- [ 16 bytes ] key id (little endian UUID representation)
- [ 4 bytes ] the length of the “purpose chain”, little endian. In our case it will always be 2
Then for each item in the “purpose chain”:
- [ 4 bytes ] length of the item, little endian
- [ variable ] the item
The way that Bitwarden and our example code use Data Protection, the “purpose chain” is [name, purpose]. I really wish this had been in the documentation.
Now that we have all the components to calculate the main encryption key, we can decrypt the payload’s ciphertext using that key and the payload’s IV, and unpad it with PKCS7. We can also verify the HMAC digest by taking the HMAC of the ciphertext appended to the IV with the derived HMAC key. And that’s it! Simple, right?
Conclusion
The full module is available at https://github.com/ivision-research/pydataprotection.
Our Python implementation has a number of limitations. It only supports CBC mode, not GCM, it doesn’t implement encryption, only decryption, it hasn’t been put through a rigorous test suite, and there is limited error checking (and that’s being generous). This is very much research code. We are releasing it with an MIT license, and encourage forks by community members if you would like to polish and use it.
As a side-note, when most people say “encrypted at rest”, they mean a system like this, where the encrypted data is stored alongside the decryption key. These days, I view “encrypted at rest” in marketing material in the same vein as “military grade encryption”: a buzzword that is correlated with poor security practices.
Our final post is next, talking about lessons learned. Or start the series over from our intro post.
