How Does Certificate Transparency Work?
Certificate Transparency (CT) is one of a few newer parts of the TLS ecosystem that aren’t directly used to set up TLS sessions, and because it isn’t user-facing, it has gone somewhat under the radar. I found a number of high-level explainers and quite a lot of technical debates between implementaters, but I felt that the middle-ground was lacking. I wanted to answer the questions “how do browsers interact with this infrastructure, and how can analysts and researchers interact with this infrastructure without using 3rd-party services?”
Caveats for this post:
- This is a summary of the author’s recent reading. I was not there when it was written.
- CT has changed pretty dramatically since its inception circa 2012. It is still changing as of 2025, and will likely continue to change in the future.
- Given the former points, there may be inaccuracies.
Background
Before we can answer those questions, we have to understand some background information. One of the longstanding skeletons in the TLS closet has been the problem of misissued or compromised certificates. Given a website’s TLS certificate, you’re supposed to have cryptographic proof that a certificate authority (CA) has verified and approved the issuer, but what if an attacker finds an exploit in a CA? Or maybe a CA is coerced to issue a malicious certificate? Or what if the website owner simply leaks their private key? TLS has two main systems for clients to verify the validity of certificates: certificate revocation list (CRL) and online certificate status protocol (OCSP). CRL is much older, but it fell out of favor until a recent comeback and is now displacing the upstart OCSP. They are both mechanisms for browsers to check with CAs to determine if a certificate has been revoked.
Certificate transparency (CT) attempts to provide another layer of security on top of revocation for misissued certificates (but not compromised certificates). It comes at the problem from a completely different angle from revocation. Instead of being a way for CAs to fix their own issues, CT is a way for other people to watch CAs to notice problems before they become widely exploited.
Let’s say that example.com usually requests TLS certificates from CA1. An attacker finds a vulnerability in CA2 and successfully requests a certificate for example.com. The attacker launches a MITM attack until eventually someone notices the malignant certificate and reports it to CA2, which then revokes it. Finally, browsers learn about that via CRL/OSCP. The idea of CT is to provide a way for someone1 to detect the issue of example.com by CA2 and issue an early-warning alert before it has a chance to be used in the wild. There are many other incorrect behaviors that should also raise alerts, like a CA issuing invalid certificates or not logging certificates.
The way this works is that every CA must add every certificate it issues to public logs, and most certify this in the issued certificate itself by adding signed certificate timestamps (SCTs). The CT logs can be used in a few different ways:
- Monitors can ingest CT logs and offer services such as alerting on a new certificate being issued for a domain you own.
- User agents (web browsers) can check whether an issued certificate is signed by a CT log (details in the next section). Some web browsers can even verify (“audit”) that the certificate was logged.
- Auditors can read the logs and check for errors or inconsistencies between them.
- And of course, anyone can download CT logs and use them for any other purpose.
One example of CT leading to a problem being caught was Certinomis in 2018 issuing valid test certificates. Andrew Ayer, the founder of SSLMate, reports that he found that problem using his company’s log monitoring service, Cert Spotter. There were many issues with Certinomis in addition to the test certificates that CT caught, and it was eventually distrusted in 2019. The official(?) CT website also lists three older examples of CAs being distrusted after CT exposed problems.
Proof of Concepts
The preceding background information can be found in various explainers online, but it doesn’t answer the original questions. Since everything is public, I must be able to interact with CT logs, but what does that look like?
Verify SCT Signatures
First, let’s emulate what a user agent would do and verify that a certificate is signed by a CT log. There are toolkits that can do this, but for educational purposes, this section will be manual. The format is documented in RFC 6962 section 3.2.
With CT, when a CA issues a certificate, it first creates a precertificate and sends that to CT logs. The guideline is for CAs to pick at least 2 or 3 CT logs from different operators. The CT logs send back a signature of the precertificate, which is a commitment to actually log the certificate. The CA then replaces the fake (“poisoned”) SCT sections in the precertificate with the real signatures returned by the CT logs to create the actual certificate, signs that, and sends that to the requester. In theory user agents can verify these signatures and do… something… if they aren’t valid. Remember, SCT signatures are from CT logs and are completely unrelated to the CA’s signing of the signature as a whole.
Excluding fixed fields, an SCT contains:
- CT log ID, which is the SHA-256 digest of the log’s public key
- timestamp
- signature
The signature for a precertificate is calculated against:
- timestamp
- public key of the issuer of the certificate
- to-be-signed (TBS) precertificate
I first tried doing this all from a shell with the openssl
CLI, but—as far as I can tell—the CLI doesn’t support modifying SCTs, which is necessary to create the TBS precertificate. So Python it is, with the pyca/cryptography library. The full script is at ct.py, and we’ll build it up piece by piece as follows.
We start by importing a bunch of libraries and setting up a SHA-256 convenience function.
import requests
from cryptography import x509
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import ec
from cryptography.hazmat.primitives.serialization import load_der_public_key, Encoding, PublicFormat
def sha256(data: bytes) -> bytes:
h = hashes.Hash(hashes.SHA256())
h.update(data)
return h.finalize()
Next we retrieve the certificate chain, including both the leaf certificate the issuer certificate, and extract the SCTs from the leaf certificate.
DOMAIN = "research.ivision.com"
tcp_sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
tls_sock = ssl.create_default_context().wrap_socket(tcp_sock, server_hostname=DOMAIN)
tls_sock.connect((DOMAIN, 443))
chain = tls_sock.get_verified_chain()
issuer_cert = x509.load_der_x509_certificate(chain[1])
cert = x509.load_der_x509_certificate(chain[0])
scts = cert.extensions.get_extension_for_class(x509.PrecertificateSignedCertificateTimestamps).value
Printing the domain and the precertificate SCTs at time of writing:
X509v3 Subject Alternative Name:
DNS:research.ivision.com
Signed Certificate Timestamp:
Log ID: 7D:59:1E:12:E1:78:2A:7B:1C:61:67:7C:5E:FD:F8:D0:87:5C:14:A0:4E:95:9E:B9:03:2F:D9:0E:8C:2E:79:B8
Timestamp: Tue Dec 24 21:33:40 2024
Signature: ecdsa-with-sha256 30:45:02:21:00:B3:0F:9A:28:77:19:73:84:90:85:01:13:82:4C:F5:C9:1F:AB:B5:62:35:BD:1F:5B:49:0F:BB:7B:C7:07:0E:EA:02:20:65:FF:B2:0E:33:5F:F5:6F:64:85:6C:C4:F2:98:93:B1:E5:AC:19:B6:3E:B2:4F:2D:75:00:9C:14:C9:47:AD:76
Signed Certificate Timestamp:
Log ID: 13:4A:DF:1A:B5:98:42:09:78:0C:6F:EF:4C:7A:91:A4:16:B7:23:49:CE:58:57:6A:DF:AE:DA:A7:C2:AB:E0:22
Timestamp: Tue Dec 24 21:33:40 2024
Signature: ecdsa-with-sha256 30:46:02:21:00:C7:0E:CC:BD:DB:D0:89:A0:1C:63:86:F5:A9:F4:1D:C6:E5:88:B4:49:EF:66:5F:C7:57:1C:15:BD:75:51:33:E7:02:21:00:B3:F8:AB:A4:7E:03:DF:32:C8:98:34:9D:65:3C:42:4F:D9:D7:F3:0A:BC:9A:0E:0A:66:E3:BD:9C:EA:2B:22:A8
Now we start interacting with the CT logs. We need to take the log IDs and look up the respective CT log URLs to query. This gstatic.com
URL is, as far as I can tell, the canonical list of CT logs.
LOGS = requests.get("https://www.gstatic.com/ct/log_list/v3/log_list.json").json()
def get_log(log_id: bytes) -> dict[str, str]:
b64_log_id = base64.b64encode(log_id).decode()
for operator in LOGS["operators"]:
for log in operator["logs"]:
if log["log_id"] == b64_log_id:
return log
raise ValueError("Log ID not found")
An operator log section in the JSON file looks like:
{
"name": "DigiCert",
"email": [
"ctops@digicert.com"
],
"logs": [
{
"description": "DigiCert Yeti2025 Log",
"log_id": "fVkeEuF4KnscYWd8Xv340IdcFKBOlZ65Ay/ZDowuebg=",
"key": "MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE35UAXhDBAfc34xB00f+yypDtMplfDDn+odETEazRs3OTIMITPEy1elKGhj3jlSR82JGYSDvw8N8h8bCBWlklQw==",
"url": "https://yeti2025.ct.digicert.com/log/",
"mmd": 86400,
"state": {
"usable": {
"timestamp": "2022-11-01T18:54:00Z"
}
},
"temporal_interval": {
"start_inclusive": "2025-01-01T00:00:00Z",
"end_exclusive": "2026-01-01T00:00:00Z"
}
},
And finally the hard part, calculating the signatures.
for sct in scts:
log = get_log(sct.log_id)
print(f"verifying '{log['description']}' ... ", end="")
assert (
sha256(base64.b64decode(log["key"]))
== sct.log_id
== base64.b64decode(log["log_id"])
)
sct_data: bytes = bytes()
# Version: v1
sct_data += b"\0"
# SignatureType: certificate_timestamp
sct_data += b"\0"
# uint64
sct_data += bytes.fromhex(hex(round(sct.timestamp.replace(tzinfo=datetime.timezone.utc).timestamp() * 1000))[2:].zfill(16))
# LogEntryType: precert_entry
sct_data += b"\x00\x01"
# PreCert.opaque: issuer_key_hash[32]
sct_data += sha256(issuer_cert.public_key().public_bytes(Encoding.DER, PublicFormat.SubjectPublicKeyInfo))
# PreCert.TBSCertificate.opaque: 3-byte-length + tbs_certificate)
sct_data += bytes.fromhex(hex(len(cert.tbs_precertificate_bytes))[2:].zfill(6))
sct_data += cert.tbs_precertificate_bytes
# CtExtensions.opaque: 2-byte-length(extensions) + extensions
sct_data += b"\x00\x00"
sct_data += b""
pubkey = load_der_public_key(base64.b64decode(log["key"]))
pubkey.verify(sct.signature, sct_data, ec.ECDSA(hashes.SHA256()))
print("verified")
And of course the successful output:
verifying 'DigiCert Yeti2025 Log' ... verified
verifying 'Sectigo 'Mammoth2025h1'' ... verified
Verify Logged Precertificate
User agents can verify SCT signatures for every certificate they encounter, but what if a log misbehaves? Just because a CT log signs a certificate, there could be a malfunction or compromise that prevents the log from including it. User agents could query the CT logs for every certificate, but because it requires sending a request to the log with the hash of certificate that the user agent visited (unlike verifying the signature, which can happen entirely offline, assuming the CT log public keys are cached), querying is both privacy-damaging and relatively resource-intensive. This seems to be a partially unsolved problem in general, but for now, Chrome offers an opt-in audit feature which queries logs for some certificates.
But what does it actually look like? We hash the data from the SCT signature with a null byte prepended (called a Merkle Tree Hash in the spec) and then query the log for that digest. The result is the full data that was hashed for the SCT signature, which includes the TBS precertificate. Appending to the previous script fragments:
print(f"querying '{log['description']}' precert ... ", end="")
tree_size = requests.get(log["url"] + "ct/v1/get-sth").json()["tree_size"]
ret = requests.get(log["url"] + "ct/v1/get-proof-by-hash", params={
"hash": base64.b64encode(sha256(b"\0" + sct_data)),
"tree_size": tree_size,
}).json()
obj = requests.get(log["url"] + "ct/v1/get-entries", params={
"start": ret["leaf_index"],
"end": ret["leaf_index"],
}).json()["entries"][0]
assert base64.b64decode(obj["leaf_input"]) == sct_data
print("verified")
Now the output looks like:
verifying 'DigiCert Yeti2025 Log' ... verified
querying 'DigiCert Yeti2025 Log' precert ... verified
verifying 'Sectigo 'Mammoth2025h1'' ... verified
querying 'Sectigo 'Mammoth2025h1'' precert ... verified
But What About End-Entity Certificates?
Eagle-eyed readers might have noticed that so far, we’ve only verified signatures and log inclusion for precertificates, not end-entity certificates, the actual certificates that user agents verify. Let’s map out the CA process again:
- CA signs precertificate, sends it to CT logs
- CT logs sign precertificate (and presumably logs it), send SCTs to CA
- CA creates final end-entity certificate including the precertificate SCTs, sends it to CT logs
- CT logs signs end-entity certificate (and presumably logs it), send SCTs to CA
The CA can’t add this second round of SCTs to the certificate because that would break the CA signature.
It is impossible (without costly brute-force) for user agents to verify the inclusion of the certificate without having the corresponding SCT because they need the exact timestamp of its inclusion. RFC 6962 specifies ways transmit SCTs for end certificates as part of a TLS handshake. SCTs can be requested and send in either TLS handshake extension 5, OSCP, or extension 18, an SCT-specific extension.
Unfortunately, I wasn’t able to find a real-world HTTPS server that actually served the end certificate SCTs. Even google.com only included precertificate SCTs in its TLS handshake. The library tooling for these TLS extensions is mostly nonexistent, so it is possible that I’m not performing the right magic spell.
Does this matter? What is the point of CT logging of end certificates anyway, instead of just logging precertificates? I’m not sure.
Retrieving All Certificates
In theory we should be able to just dump all certificates with a large range:
all_certificates = requests.get(log["url"] + "ct/v1/get-entries", params={
"start": 0,
"end": tree_size - 1,
}).json()
But in practice (and theory too, since the spec allows it), log operators limit the number of entries returned per request. From some experimentation and napkin math, downloading a full CT log with serial requests would take on the order of magnitude of a year, depending on the operator and year.
Tiles
Although the preceding sections describe how CT logs work currently, this is planned to change. Operators have discovered that the web API specified in RFC 6962 requires significant server resources. There is a new paradigm using “tiles” that has a specification at https://c2sp.org/static-ct-api. None of the current trusted CT logs are running a tile-based server yet, though. A quick way to check is to issue a GET request for <URL>/checkpoint
. This will 404 with RFC 6962 servers but return some text from a tile server:
$ curl https://twig.ct.letsencrypt.org/2025h1b/checkpoint
twig.ct.letsencrypt.org/2025h1b
1762066645
dpEFMSdKvx5sYcDOASnTxfqePkqVBqdiPoJAaKIzCJQ=
— twig.ct.letsencrypt.org/2025h1b yi6aRH6NVTyT0tcfBKeWJMZ5NUh8+viO2zjVzlYpuo4=
— grease.invalid y6ferc5jFYDFeKiGBb2S/zUAXGAZcxosFaFE6J/USXkK2Na+ZXdTL+AsxdShJGVyxBQ7
— twig.ct.letsencrypt.org/2025h1b /fAPWQLhECByhyq1vybOsG/0bvosqcwKJgluzh2gkN8tghUBQXK4y4BRqzu4t6iICe9jI2GsQ2nKtHcSLyy6Ol2xAwk=
— twig.ct.letsencrypt.org/2025h1b 28HaugAAAZUVr8LrBAMARjBEAiBKK5OwdulsEZo0TKuA17eCSP6CDRY+6tITssuJFCzyMwIgAnrhMBE7CTXVgaYRH/lZAaHxUIS/DYus/SXY8k6+FZg=
Because the specification isn’t finalized, and trusted logs aren’t using it yet, further investigation is out of scope for this post.
Conclusion
When researching CT and trying to verify the real-world usage, I kept coming across things that made me go :thinking_face:. Here are some examples:
- The last example of CT logs catching improper behavior was 6 years ago at time of writing.
- The canonical CT log listing is a gstatic.com URL.
- It isn’t very clear how verifying certificate inclusion in CT logs is supposed to happen at scale, since user agents checking it is a privacy issue, the same reason that OSCP is being phased out.
- Pretty much every client or library not specifically designed for CT has missing or partial implementations of at least one part of RFC 6962.
- End certificate CT inclusion seems to be impossible to verify in practice.
None of these are deal-breakers, and CT still has obvious utility. More eyes on it and a larger community caring could lead to major improvements.
-
“someone”, who? That’s a good question. ↩