Spoofing certificates with MD5 collisions

- 10 min read - Text Only

I attended a presentation at Crypto and Privacy village where Tomer Peled and Yoni Rozenshein from Akamai. They reverse engineer a Windows update to crypt32.dll to find out what's behind CVE-2022-34689. A truncated MD5 was used as an index to a hash table which caches whether a certificate has been validated successfully. Only the MD5 was compared when the entry was found in that cache. By using MD5 collisions, they found that crypt32.dll would validate a malicious certificate after an honest certificate was validated.

This talk summary is part of my DEF CON 31 series. The talks this year have sufficient depth to be shared independently and are separated for easier consumption.

This next talk commenced at the Crypto and Privacy village. I went in knowing that MD5 has not been used in certificates for a long time now and for a good reason.

Excited to speak for the first time at DEF CON, where @TomerPeled92 and I will present our research together at @CryptoVillage. 😎 "Spoofing certificates with MD5 collisions - party like it's 2008!" is a talk about our analysis of the NSA/GCHQ-reported CryptoAPI CVE-2022-34689.
Photo included with tweet
Spoofing certificates with MD5 collisions was originally done in 2008 (@realhashbreaker, @alexsotirov et. al.), so what's it doing in a 2022 CVE? It turned out MD5 comparisons appeared in an unusual context, which allowed collision attacks to make a retro comeback. 💿
Getting ready with @TomerPeled92 for our talk this afternoon, 5pm at @CryptoVillage @defcon. Pictured: seats for you to fill!
Photo included with tweetPhoto included with tweet

The presentation

The National Security Agency (NSA) published a Common Vulnerabilities and Exposures (CVE) for a Windows CryptoAPI Spoofing Vulnerability - CVE-2022-34689. The speaker mentions that NSA and GCHQ (the United Kingdom equivalent) release very interesting CVEs with few public details. To find out what it was, researchers had to reverse engineer the changes to crypt32.dll. Thankfully, crypt32.dll's changes were easy to identify. By doing a binary comparison with the prior release, researchers identified the fix with ease.

Windows uses crypt32.dll to validate certificates for applications, drivers, and websites. Inside is a cache so that when a certificate is validated a second time, it can skip the expensive cryptographic operations like RSA signature verification. This cache used a hash table to store the verification results. In essence the hash table would be internally keyed by the MD5 hash of the certificate bytes and the value found would be an object with the certificate and its validation status.

How the certificate validation works. MD5 is called on the certificate, truncated, and then modulo'd to go into a cache. The next time crypt32 sees the same MD5, it will be immediately verified and trusted.

When the MD5 hash is the same and the certificate is not, we have a birthday attack through a hash collision. In short, crypt32.dll could be tricked to validate a valid certificate (called cache poisoning) and later assess that a malicious certificate with the same MD5 hash as also valid. This enables a man-in-the-middle attack (MITM) with applications that retry TLS connections with crypt32.dll as the certificate verification method.

High level attack flow. The attacker serves a malicious certificate as the same MD5 thumbprint as the one in the CryptoAPI's cache. The vulnerable application compares the identical thumbprints. The victim trusts the attacker.

The researchers demonstrated this successfully using an old version of Chrome from 2015. Since then, Google Chrome has switched to another cryptographic backend BoringSSL. Their MITM proxy would first send the target's certificate, which is public, to the client and poison the crypt32.dll cache. Then the request would fail and the client would try again. This time, it received the malicious certificate and crypt32.dll sees that a certificate with the same MD5 hash was valid, without comparing the certificate saved in the cache with the input certificate. Finally, the TLS handshake succeeds and the client is served a malicious page while the security symbol shows that everything checks out.

MD5 collisions can be found in two ways. The first is to have a common prefix and two different suffixes. This one can be found in seconds on a modern computer. The second is to have two different prefixes and two different suffixes. That one takes much longer to find, though it is not expensive for a threat actor to do. Once a collision is found, if a researcher or threat actor adds another suffix to the data, the data continues to have a colliding digest.

The second approach is the only option for this use case. The malicious certificate must be valid binary ASN.1 certificate encoded with DER. Existing MD5 collision techniques work by mangling a binary block of data. Given the difficulty of finding one collision, it would be too expensive to find acceptable collisions if the block of data were constrained to fit a certain pattern, protocol, or encoding. The researchers needed somewhere to insert arbitrary binary data after their private key to make the certificate functionally accepted while keeping the same MD5 hash in the middle. Then, by appending the rest of the certificate, it appears with the same final MD5 hash to crypt32.dll.

The next challenge is where to put the arbitrary data after the public key.

MD5 collision prefixes are different, as the attacker's public key needs to be inside. Then there are collision blocks. Afterwards the same MD5 state is achieved and the Certificate signature is appended. At the end, both have the same state.

It so happens that there is a really convenient location. In RFC 5912 - New ASN.1 Modules for the Public Key Infrastructure Using X.509 (PKIX) - section 6, it specifies the semantic ASN.1 structure: the pk-rsa object is an ordered set of bytes where it has the RSAPublicKey followed by a PARAMS value. This value must be null, and must be checked to be null.

RSAPublicKey ::= SEQUENCE {
   modulus         INTEGER, -- n
   publicExponent  INTEGER  -- e

rsaEncryption OBJECT IDENTIFIER ::= {
   iso(1) member-body(2) us(840) rsadsi(113549) pkcs(1) pkcs-1(1) 1 }

pk-rsa PUBLIC-KEY ::= {
   IDENTIFIER rsaEncryption
   KEY RSAPublicKey
   -- Private key format not in this module --
   CERT-KEY-USAGE {digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment, keyCertSign, cRLSign}

What if certificate had PARAMS set to an arbitrary bit or byte string? And, what if certificate validation code did not validate that PARAMS is always null? Then we have a perfect place to insert arbitrary data to mangle the MD5 digest as we desire. In practice, cryptographic libraries omit this check as the CA signature would fail and trusted CAs would not sign a public key without proving ownership of the domain.


Alternatively, they published the demo through a tweet.

Remember the vulnerability in CryptoAPI which could result in an attacker masquerading as a legitimate entity? Akamai researchers analyzed it and have found a way to exploit it. See the write-up here: www.akamai.com/blog/security-research/exploiting-critical-spoofing-vulnerability-microsoft-cryptoapi

More details can be found on the official write up Exploiting a Critical Spoofing Vulnerability in Windows CryptoAPI.