The Risks of Predictable Identifiers

When building web applications, developers often need to generate identifiers for resources such as user accounts, documents, or objects. These identifiers are typically exposed to the client and used to interact with the server. However, using predictable identifiers can introduce several security risks:

  1. Information Leakage: Predictable identifiers, such as sequential numbers or timestamps, can leak sensitive information about the system. For example, an attacker can infer the number of resources created or the rate at which they are being generated. This information can be used to gather insights about the system’s usage and potentially aid in further attacks.

  2. Insecure Direct Object References (IDOR): Predictable identifiers, once enumerated, can be useful if there are IDOR vulnerabilities in the application. If the application relies solely on the obscure identifiers for access control, without proper authorization checks, an attacker can use the discovered identifiers to access sensitive resources. This can result in data breaches or unauthorized modifications.

A Practical Example

Imagine you run a dating site called “LoveMatch”. LoveMatch allows users to create profiles, browse other users’ profiles, and express interest in potential matches. The site generates sequential user IDs for each registered user and includes these IDs in its logs for tracking and analytics purposes. One day, two users, Alice (User ID: 2345) and Bob (User ID: 6789), both express interest in the same person, Charlie (User ID: 1234). The logs capture these interactions, with entries like “User 2345 liked User 1234” and “User 6789 liked User 1234”. Unfortunately, due to a security breach, an attacker gains unauthorized access to LoveMatch’s logs. By analyzing the logs, the attacker can deduce that both Alice and Bob are interested in Charlie. This information leakage can lead to embarrassment and potential conflicts between Alice and Bob, as they may feel their privacy has been violated. Moreover, if the attacker is particularly interested in Charlie, they could enumerate other users who have shown interest in Charlie by searching for similar log entries, further compromising the privacy of LoveMatch’s users.

This situation could get worse! What if there are also authorization weaknesses in the application that would allow an attacker to use these revealed identifiers to access other user’s profiles or perform actions on their behalf?

The LoveMatch team decides that since they can’t change the user identifiers the quickest fix is to encrypt the user IDs before they are sent to clients. This change will prevent attackers from easily guessing or enumerating the identifiers and accessing other users’ profiles. However, as we will see, encrypting identifiers is not a silver bullet solution and can introduce additional security risks.

Why Encrypting Identifiers Won’t Solve the Problem

The LoveMatch engineering intern whips up a quick example that shows how to encrypts the user IDs before sending them to the client. They believe this will prevent attackers from easily guessing or enumerating the identifiers. Here is the output of the demo:

User: Alice
Original: 6789, Encrypted: Zml4ZWRpdjEyMzQ1Njc4Ofrf1MJnLhCCLc6HROSYAwU=
Original: 8966, Encrypted: Zml4ZWRpdjEyMzQ1Njc4OXDr17ote4kkvc5E5nQipXQ=
Original: 1234, Encrypted: Zml4ZWRpdjEyMzQ1Njc4OQjPtWd9n2GJ6qBuu7bpe7Y=
----
User: Bob
Original: 6789, Encrypted: Zml4ZWRpdjEyMzQ1Njc4Ofrf1MJnLhCCLc6HROSYAwU=
Original: 8966, Encrypted: Zml4ZWRpdjEyMzQ1Njc4OXDr17ote4kkvc5E5nQipXQ=
Original: 1234, Encrypted: Zml4ZWRpdjEyMzQ1Njc4OQjPtWd9n2GJ6qBuu7bpe7Y=

The intern thinks this is great and solves the problem. However, upon closer inspection, several issues become apparent:

  1. The encrypted identifiers are not random; they are deterministic based on the input identifier. An attacker can may be able to build a lookup table of encrypted identifiers and their corresponding plaintext values.

  2. The first part of each encrypted value is the same, which in most cases indicates a fixed IV (Initialization Vector). Using a fixed IV makes the encryption scheme vulnerable to certain types of attacks.

  3. The encrypted identifiers can still be used to reveal who has a shared interest in a particular user. An attacker can intercept and analyze the encrypted identifiers to infer relationships between users, even if they cannot directly decrypt the identifiers.

A project I worked years ago was a perfect demonstration about why encrypted identifiers could be insecure. A wireless lighting control system used encryption to hide the command identifiers sent to control the lights. They thought that if no one could understand the meaning of the commands then the system would be secure. By carefully observing the encrypted commands, an adversary could build a dictionary of encrypted commands and replay them to control the lights. The adversary didn’t need to decrypt the commands, they just needed to replay them. Rumor was this lighting system had been used in a building that experienced mysterious outages. Maybe I wasn’t the first to figure out this weakness.

Now, let’s take a look at the code that generated this problematic output:

import base64
from Crypto.Cipher import AES

SECRET_KEY = b'secretkey1234567' # bad
FIXED_IV = b'fixediv123456789' # even worse

def encrypt_identifier(identifier):
    key = SECRET_KEY
    iv = FIXED_IV
    cipher = AES.new(key, AES.MODE_CBC, iv)
    padded_identifier = identifier + ' ' * (16 - len(identifier) % 16)
    encrypted_identifier = cipher.encrypt(padded_identifier.encode())
    return base64.b64encode(iv + encrypted_identifier).decode()

def decrypt_identifier(encrypted_identifier):
    key = SECRET_KEY
    encrypted_data = base64.b64decode(encrypted_identifier)
    iv = encrypted_data[:16]
    cipher = AES.new(key, AES.MODE_CBC, iv)
    padded_identifier = cipher.decrypt(encrypted_data[16:])
    return padded_identifier.decode().rstrip()

for user in ['Alice', 'Bob']:
    print(f'User: {user}')
    for identifier in ['6789', '8966', '1234']:
        encrypted_identifier = encrypt_identifier(str(identifier))
        decrypted_identifier = decrypt_identifier(encrypted_identifier)
        assert identifier == decrypted_identifier
        print(f'Original: {identifier}, Encrypted: {encrypted_identifier}')
    print('----')

Now that the LoveMatch team has decided to encrypt the identifiers, they have even more security issues to worry about. Let’s break down the problems with this approach:

  1. Unauthenticated Encryption: The encryption scheme used in this example does not include any authentication mechanism. This means that an attacker can potentially manipulate the encrypted identifiers without detection. If the application relies on these identifiers for authorization or access control, an attacker could exploit this vulnerability to gain unauthorized access to resources.

  2. Exposure of Encryption Key and IV: In the given example, the encryption key and initialization vector (IV) are hardcoded in the application code. When we see this in a client’s codebase it is often because the key and IV are being shared elsewhere. We have even seen developers make the questionable decision to include the key and IV in the application frontend source code. Once the key and IV is included in the frontend (e.g., in JavaScript), an attacker can easily extract the key and IV, rendering the encryption useless. It is crucial to keep encryption keys secure and use random IVs for each encryption operation.

  3. Encrypted Values Can Still Be Used for Exploitation: Even if the encryption scheme is properly implemented and the keys are kept secure, encrypted identifiers can still be used to exploit authorization issues. For example, if the application uses the decrypted identifiers to perform authorization checks on the server side, an attacker can intercept and reuse valid encrypted identifiers to access resources they should not have access to. Encrypting identifiers does not prevent this type of exploitation.

There are also some other issues in this simple example, such as not generating a high entropy secret key, not using a proper padding scheme1, and not handling errors in a way that would prevent information leakage. Nevertheless, this simple example shows the problem with using encryption to hide resource identifiers.

Recommendations for Secure Identifier Generation

Instead of relying on encryption to hide resource identifiers, use secure random values or universally unique identifiers (UUIDs2). If generated on a per-user basis, these random identifiers would prevent the cross-user data leakage discussed above. Here are some guidelines:

  1. Implement Proper Authorization Checks: First of all, it is crucial to implement robust authorization checks on the server side. Ensure that each request is properly authenticated and authorized based on the user’s privileges. Do not rely solely on the secrecy of identifiers for access control. Get this right and, other than potential information leakage, the risks of predictable identifiers are eliminated.

  2. Use Secure Random Values: Generate identifiers using a cryptographically secure random number generator. These values should be sufficiently long and unpredictable, making them infeasible to guess or enumerate. For example, you can use the secrets module in Python to generate secure random values.

  3. Utilize UUIDs: UUIDs are standardized identifiers that are globally unique and highly unlikely to collide. They consist of 128 bits and are commonly represented as strings. Most programming languages provide built-in libraries for generating UUIDs (e.g., the uuid module in Python).

If LoveMatch implemented these recommendations they would have a simpler and less risky solution that still protects their end users. They could also avoid the pitfalls a naive encryption scheme can introduce. Combining these techniques with proper authorization checks will enhance the overall security of the application.

  1. The padding scheme used in this example is not secure. It is recommended to use a standard padding scheme such as PKCS7 to expand the cipher text to be a multiple of the block size. 

  2. UUIDs are also referred to as globally unique universal identifiers (GUIDs).