We rely on several cryptographic tools constructed together to secure our lives. Many depend on hidden information with specific properties to provide the security benefits they claim. It is not always convenient to agree upon or distribute a large amount of hidden information like a One-time pad. Key Derivation Functions (KDFs) reliably create unrelated keys for different cryptographic tools from a single Input Key Material (IKM).
In fact, a KDF helped you view this article, specifically the HMAC-based Key Derivation Function (HKDF). While correctly used in your browser, it is often misused as I will show in critiquing an anonymized example I found online in a publication by AnonCo.
I will refer to the organization that misused HKDF as AnonCo. Please focus on learning from this content rather than finding the source.
Just about every cryptographic tool out there can be used wrong. Deploying correct cryptography is hard, so hard that you should never do it alone.
Tried explaining danger of homebrew crypto to a journalist in DM "Cryptography is nightmare magic math that cares what kind of pen you use."
Cryptography is a science that uses hard mathematical problems to make specific threat models practically infeasible to execute. Not all threat models are equal. For example, if you are preventing complete theft, you may not be preventing partial theft.
Unlike the watermelons above, AnonCo's misuse does not open them up to any new threats. It happens to meet AnonCo's functional goals more by accident than on purpose. In cryptography, accidents are dangerous and not something to joke about.
I would not want to be hit by a loose trailer just because they used a wrench as a hitch pin.
HKDF misuse
In this article, "misuse" has a specific meaning: a cryptographic tool is not delivering all intended security properties because it is not used correctly.
If you're like me, you might be stuck on some of the details and not get the why the security properties change when misused. After all, it still looks like some random data comes out at the end.
This took a lot of vacation time for me to figure out, but it was fun in its own way. I hope my words save you some time!
I highly recommend you read Soatok's article: Understanding HKDF (archived). It goes into great technical detail which I think any aspiring cryptographer should reference.
Here are a few ways HKDF can be misused:
In a public setting, no salt is given.
The salt input is not indistinguishable from random (IND).
The salt input is used for domain separation.
The same salt is used across multiple transactions in a public setting.
Different salts are used in a private setting.
The inputs to HKDFs are the same for different contexts, resulting in the same keys for different purposes.
Cryptographic Extraction and Key Derivation: The HKDF Scheme describes several weaknesses observed in KDF use at the time. It contributes HKDF which is a robust KDF in that the derived keys are still practically unlinkable and unguessable even when HKDF is misused.
For example, it is not the end of the world if the salt input is not indistinguishable from random. HKDF works well for private deterministic use cases where the salt is not provided. When there is no salt, the salt is equivalent to 000000000...
Though, if your public protocol implements unique keys for each session, review the above misuses and consider employing a salt which cannot be directly manipulated by either party and is otherwise indistinguishable from random.
That said, the scope of this article is for HKDFs and key-based key derivation functions for key expansion in a private setting.
Aside, if you're here looking for info on password-based key derivation functions (PBKDFs), sorry to say this isn't the right reference!
Indistinguishable from random
A cryptographically-suitable key is a sequence of bits that are indistinguishable from random. In academic literature, it is typically contracted to IND. Others may say that it is "uniformly random" as it was selected from a uniform distribution of possible configurations.
If there is a bias in which states are selected, then it is not uniformly random. Likewise, if there is a bias when it is encoded to binary, then it is not uniformly random in binary.
In practical terms an IND bit string cannot be meaningfully analyzed. It is also impractical to guess or brute force when it is sufficiently long.
You may see names like AES-128 or SHA-256 and wonder what that's about. The names refer to how many bits there are in their configuration or design.
Asymmetric keys and asymmetric computations have distinctive patterns, which makes them unsuitable for use in symmetric cryptography like AES-256. See Why some cryptographic keys are much smaller than others (archived) by CloudFlare. A distinctive pattern makes it not IND!
That's why we reach for KDFs! It is a tool that produces IND bit strings. That said, this article will focus on IND coming in and IND coming out.
Also: in software, we pass byte strings around in memory to solve problems. When I mention bytes or bits, I mean the same thing.
Key Expansion
Key expansion is a function that takes a cryptographically-suitable key (which is IND) and a context as input and outputs IND sequence of bits of a desired length.
How KDFs secure websites
When you connected to this web page, your machine and the server created a shared secret, a shared salt, and then used HKDF to create multiple IND keys to secure and authenticate data going in each direction.
You mentioned desired length. That suggests the length is configurable? Could I make a gigabyte of data with it?
It is flexible by design to support all sorts of use cases. It can even generate multiple keys at once with a long output length!
But do not use it as a stream cipher. It is not meant for that!
HKDF limits how long its output can be. At most you can get around 8KB (with SHA-256) using HKDF. It will only call HMAC up to 255 times in the expand phase. See Boring Crypto hkdf.cc line 72 (archived) and RFC5869 Section 2.2.
Multiple keys? I see only one output. Where are the other keys?
The output is just a byte array so you can slice it up as much as you like as long as each slice is non-overlapping.
Since any slice of an IND bit string is also IND, it is still suitable for cryptographic use. See AES-GCM-SIV: Specification and Analysis section 4 for an example on how they expand a key and then slice it up.
Planning for the future
Every week, my application adds another feature. I cannot make an ever growing byte array of secret key data. What should I do about that?
This depends on the risks you're willing to accept in the name of simplicity. You might have just one key for database encryption, as an example. This might be fine if you have a small amount of data. A cryptographer can help you understand the risks your implementation has and what steps should be taken to improve the security of your data and users at scale.
Tech bloggers, whether or not they are also cryptographers, are not your cryptographer. At minimum, they don't know your threat model or systems designs. This is an incredibly specialized domain that's easy to get wrong. Hire a cryptographer.
I am looking to encrypt data with different keys without having to provision those keys manually. And eventually, I want to encrypt even more different things with their own keys without saving the keys that were generated. What's the tool to do that?
A tool for just-in-time deterministic sub-key derivation is HKDF.
Seriously, a warning: do not do cryptography alone.
Here be dragons, but cryptography! There are still some security concerns left unaddressed here. What if one of the output keys gets compromised? How will you rotate that out and prevent new things from being encrypted with it?
This article is not a guide for key life-cycles and key management. You need a cryptographer to help you with that problem, not a blog post on the internet and certainly not an answer on Stack Overflow / Stack Exchange.
Do not naively copy what you see recommended here into your application. What I share and describe is to illustrate the security properties of the tools we have available. It is up to you and a cryptographer to build something beautiful and trustworthy together.
HKDF Sub-key derivation on the fly
As a recap, here's where we are in this story:
we have Input Key Material (IKM) with enough entropy
we know what cryptographic operation will use this key, and its requirements (e.g. the length of the key)
we have a unique label for what it will be used for (e.g. encrypting a certain database table and column)
we desire keys on the fly for the operation we are about to do
we must be able to create the same keys on the fly again for another operation at a later time
And the interface to HKDF looks something like this:
Hey, about that application problem, how do I solve it?
First, we put in our input key material (IKM).
Second, our problem does not involve another party executing cryptography. Therefore, we do not put anything into the salt.
Why not put anything into the salt?
The salt is for solving a problem between two parties. We want to resist analysis when both sides use the same secret key. The salt does not have to be secret, but it should have a proportionate amount of IND data. It is directly used as an HMAC key in the extract phase on the shared secret.
Since our use case is private and not with another party, the salt provides no additional benefit because our context is private. Including a satisfactory salt requires us to carefully create, store, recall, and handle more data. Without the salt, the extract phase will use an all-zero array as the key. This is acceptable for this use case: we are deriving keys for ourselves from a suitable IKM.
What about on passwords? I see salts on MD5 and bcrypt passwords.
Don't use MD5. Salt means something slightly different for passwords. It exists to increase the computational complexity of creating and maintaining a precalculated password database called a Rainbow table. For key exchanges, unique salts eliminate the risks of reusing a shared secret.
Third, for info, we put in a unique label. HKDF could be called twice with the same key, salt, and length, but have different labels like "encryption key" and "authentication key".
The info parameter is used to separate keys from one another. It is also a good idea to have predictable fixed length info values. You will have to exercise caution if variable length data is included.
And fourth, for length, we provide the key size we want. In this case it is likely bytes, so for 256 bits: length = 32.
Many APIs use bytes instead of bits. Be sure to review the documentation of the implementation! Note that NIST standards require additional data on info, such as the length as bits on the very end. (archived line 191)
Finally HKDF can be called and the IND output can be used for encrypting, authenticating, or some other neat thing!
What was that label thing about?
It is important that we do not reuse keys for different purposes. The label is an application-provided string that says what the purpose is. In essence, this input is used for domain separation in the expand phase of HKDF. That expand phase is what provides KDF security.
If it is possible that the same label is used for two purposes, you have a canonicalization problem.
What is KDF security?
In practical terms: if one of the keys made by the KDF is leaked, the other keys made by the same KDF are still safe. In addition, all the keys made are IND. This second property comes from PRF security. All KDFs are also PRFs.
What is canonicalization?
Canonicalization is the process of taking multiple pieces of data and serializing it together in an unambiguous way. If any of the pieces change, then the output is also changed. This is especially useful in verifying data that can be reorganized in transit.
Do not blame the users or library authors when the same issue keeps appearing. Instead, the specification and technology is prone to misuse. So, alternatives should be considered.
Case Study: AnonCo
AnonCo's product relies on a technology that abstracts database storage and database operations. Additionally, they use a compatible security dependency which facilitates seamless encryption and decryption when it goes in and out of the database to the application.
Why might we want to use application side encryption? I can encrypt the database in AWS.
At rest encryption, or encrypting the disk or whatever only matters if your attacker can get to the disk. If they can clone the database in AWS, change the database's password from AWS, and then sneak inside to dump it, guess what? All that data is available for the threat actor to pilfer.
The method described above is literally what happened in the breach I experienced. At rest encryption in the cloud is check-box security and nothing more.
However, that security library does not provide per-column encryption keys, which is a feature that AnonCo wants. AnonCo has a lot of customers they need to protect across many integrated products. At their scale, it is a good idea to encrypt each sensitive database field with a different key. Unfortunately, the plumbing to do this requires a product developer to add a new key correctly each time they need to add or migrate an encrypted database field!
Not only that, but also the ongoing maintenance of adding new keys to mitigate key exhaustion! Holy yak shaving, Batman!
This not only disincentives secure development, it also introduces the chance of an accidentally reusing keys or introducing weak (not IND) keys through manual process!
How often do I need to rekey?
It is generally a good idea to swap a new primary key every year, or every billion operations for most businesses. To really know: hire a cryptographer to give you a recommendation. Like being told to get a lawyer, this advice cannot get old if you care about doing the job correctly.
Tech bloggers, whether or not they are also cryptographers, are not your cryptographer. At minimum, they don't know your threat model or systems designs. This is an incredibly specialized domain that's easy to get wrong. Hire a cryptographer.
I've heard stories of how some big tech teams blindly get told to use a new key every day, even if they only store about 1000 records a day. They'd store the daily key in DynamoDB encrypted with a KMS master key. When reading a record, they'd find the key ID on the record, read the corresponding key from Dynamo, and decrypt the record. The proportion of data to key period here is in my opinion excessive.
AnonCo tried to automate key provisioning to eliminate manual process using HKDF.
An author at AnonCo shared how their team solved a problem with HKDF. It was a genuine team effort that successfully shipped to production at AnonCo. There are mistakes and I have responsibly and respectfully communicated my critique with constructive recommendations.
I highly respect this team and so I will not be sourcing the example I critique. In fact, the example is rewritten in another language to further distance the source from this article.
There are multiple problems with the approach AnonCo used, which I will cover!
Here's the important code, which is translated for anonymity.
The key they're getting out will look functional, but it will not have the security properties one expects from a KDF.
First: salt is being given the label!
Again, the salt is meant to resist analysis of a shared secret in a public transaction.
This salt thing is confusing.
HKDF was designed to solve an important problem in key agreement which produces a shared secret with a mathematical operation in a public setting.
For example: Finite Field Diffie-Hellman (FFDH) and Elliptic Curve Diffie-Hellman (ECDH) produce hard-to-guess secrets which when encoded in binary are not IND.
It was also designed as a KDF that safely handles our use case, when it is well understood and applied.
Private key derivation
Most security engineers do not write protocols between peers, servers, or clients. They write solutions to problems within their organization. Distributing secrets is a solved problem.
Let's assume that AnonCo can distribute an IND secret IKM to their servers and that the key was correctly created.
If you are deriving keys from text like "hunter2" or doing something silly like hashing a UUID sha256("9f1bf359-054c-4e3d-8845-ff6cb928c311"), I will hunt you down.
In both cases, not enough hidden knowledge is given to the key derivation key to maintain its expected security properties in computational complexity. HKDF uses HMAC which uses a hash. Since the HMAC key is the same size as the hash block size, it should receive an IND key of same size as the underlying hash. For example, when using HKDF with SHA-256, aim to supply an IND key with 256 bits.
It may be helpful to think of a Key Derivation Key (KDK) as a combination of Input Key Material (IKM) and a Key Derivation Function (KDF). In this case, the KDF is HKDF.
Deirdre Connolly
One of the things that you worked on that kind of is touching on tink is binding properties of key material to the key material itself, as opposed to it kind of being defined by a standard or out of band or stored somewhere else.
A lot of problems stem from the fact that when we talk about keys, we think about the 32 or 16 raw bytes of random data that don't include the full information of how it will be used. The right way, in my opinion, to handle these things is to always consider the key as the entirety of the function or functions that it defines. Like, just given the key, I should be able to encrypt something.
I shouldn't need any additional context for that. And that means I need to know: do I use AES-GCM with this key? Do I use as AES-CTR-HMAC or something with that key? And, this is a fairly simple concept in some aspects, like I just put everything into the key and then I get like a very straightforward API where I just have a function called encrypt that takes a plaintext and some associated data and then just encrypts that. Because the key includes everything else that you need to know.
But it also has a lot of security benefits with it. I usually do not want to use the same key material in two independent contexts. And that means I do not want to use the same key material with two different algorithms.
Then, in your application, decode the hex and now you got 32 bytes or 256 bits of entropy to use as a master KDK! Assume that from now on, we will be using the binary form, not the hex form, as an Input Key Material (IKM). By definition, the hex form is not IND. If we use IND key material, our application has less computational overhead for the same level of security.
Hey! Do not copy this key0de81... and use it in your configuration! Never copy what looks like random material in a blog and use it in your code! Seriously, don't be like Hyundai: Hyundai Uses Example Keys for Encryption System (archived).
A short rant on API quality
Experienced developers see "salt" and set it like a password salt thinking it will make the construction more secure.
HKDF is an incredible tool! But, the documentation we have for it (see Web Crypto API - HkdfParams) lacks useful examples on how to use HKDF correctly and for which circumstances.
In JavaScript, salt is a required parameter! It is supposed to be optional! This is not C.
It should not be so hard to do the right thing.
Here is how to correctly use HKDF in JavaScript to derive new keys, such as a signing key, deterministically from an IKM of sufficient entropy.
// We generated this above as an example IND key derivation keylet openSSLSecretKey = '0de81e851cd7995626ad4c3e160ae1c449af4e15c8ceabd44fb75be581adfbaa';
// Parse the hex string into a Uint8Arraylet ikm = Uint8Array.from(openSSLSecretKey
.match(/.{1,2}/g)
.map((byte) =>parseInt(byte, 16)));
// Import the raw key datalet kdk = await crypto.subtle.importKey(
'raw',
ikm,
'HKDF',
false, // KDF keys cannot be exported
['deriveKey', 'deriveBits']);
// We are going to create a signing key from the secret// If we create other keys too, they should not have the// same label!let label = 'signing key';
// This function works with bytes.// Therefore we must encode our label which is text to bytes.let encoder = newTextEncoder();
let info = encoder.encode(label);
// A salt is a required property, even though it is empty.let salt = newUint8Array(); // Nothing inside!// Derive a signing key from the key derivation keylet signingKey = await crypto.subtle.deriveKey(
// Again, the salt is empty// The info will uniquely describe this key
{name: 'HKDF', salt, info, hash: 'SHA-256'},
// The input key material we decoded from hex above// and then wrapped in a CryptoKey
kdk,
// We're creating an HMAC-SHA-256 key
{name: 'HMAC', hash: 'SHA-256'},
// We do not need to export it,// since we can create it deterministically.false,
// it needs to sign and verify
['sign', 'verify']);
// Prove that it works// Let's sign "Hello world"let message = 'Hello world';
let encodedMessage = encoder.encode(message);
let tag = await crypto.subtle.sign(
{name: 'HMAC'},
signingKey,
encodedMessage);
console.log(`Message ${message} - tag: ${btoa(tag)}`);
// Message Hello world - tag: W29iamVjdCBBcnJheUJ1ZmZlcl0=// And prove that it can match its own mac too.let verified = await crypto.subtle.verify(
{name: 'HMAC'},
signingKey,
tag,
encodedMessage);
console.log(`Verify?: ${verified}`);
// Verify?: true
This is a sketch. An example. It is not a complete reference. If you are creating keys on the fly for different purposes you must do more! I share some hints below.
A brief reminder of what AnonCo's source looks like:
asyncbuildKey(encryptionKey: CryptoKey) : Promise<CryptoKey> {
let data = `${newDate().getFullYear()}`;
let key = await crypto.subtle.deriveKey(
{
name: 'HKDF',
salt: this.salt,
info: this.encoder.encode(data),
hash: 'SHA-256'
},
// ...
);
return key;
}
Inside HKDF, it is doing something like this:
// inputs
input_key_material = encryption_key
salt = "table_column"
info = "2023"
// Extract a key derivation key
// The goal of extract is to produce an IND KDK
// The input key material has enough hidden knowledge
// to be an effective input key to the extract process
key_derivation_key = HMAC(salt, input_key_material)
// Expand the KDK as needed with info
output_key = HMAC(key_derivation_key, info + "\x01")
AnonCo should have an IND KDK coming in.
If a hex string like 0de81e851cd799... happens to be the input key material, then HKDF will force it into an IND KDK with the extract phase.
Logically, transforming an IND IKM to an IND KDK of the same security level provides no benefit and only a minor performance penalty.
The extract phase is inappropriately being used for domain separation, when the security goal is only to create an IND KDK.
Then the expand phase creates a new unique key using... the year.
Uhm, a year is not unique!!! Literally, for this use case, the info parameter must be unique.
It would be far better to have the info set to `${table}_${column}_${year}`!
This is one of those sneaky ways where you can use cryptography and it looks and behaves the way you want on the outside, but fundamentally misses the security guarantees intended.
This mistake reduces the security guarantees to PRF security.
Again, Pseudo-Random Function (PRF) security is that the output of a PRF is indistinguishable from random (IND).
This is still good and meets the needs for an encryption operation, but it is not what HKDF is supposed to provide and that indicates HKDF is misused in this code.
Why does it not give KDF security?
The salt is only used in the extract phase in HKDF and if present should only have a single IND value.
The extract phases is meant to provide a cryptographically-suitable IND key for the expand phase – which is what actually satisfies the security property when info is unique.
Instead the info parameter is given "2023" every time this year. I expect that multiple keys will be created in the year 2023. Inherently, this implementation is not aligned with the design of HKDF.
A modified version of the code that uses HKDF correctly is:
exportclassColumnEncrypt {
private label: Uint8Array;
private salt: Uint8Array;
private encoder: TextEncoder;
private constructor(encoder: TextEncoder, table: string, column: string) {
this.encoder = encoder;
this.salt = newUint8Array();
this.label = encoder.encode(`${table}_${column}`);
}
asyncbuildKey(keyDerivationKey: CryptoKey) : Promise<CryptoKey> {
// Logically the label above will be prefixed to the data belowlet data = `_${newDate().getFullYear()}`;
let info = newUint8Array([
...this.label,
...this.encoder.encode(data)
])
let key = await crypto.subtle.deriveKey(
{
name: 'HKDF',
salt: this.salt,
info,
hash: 'SHA-256'
},
keyDerivationKey,
{name: 'AES-GCM', length: 256},
false,
['encrypt', 'decrypt']);
return key;
}
staticasyncnewInstance(table: string, column: string) : Promise<ColumnEncrypt> {
returnnewColumnEncrypt(newTextEncoder(), table, column);
}
}
What happened to encryptionKey?
I renamed the parameter called encryptionKey to keyDerivationKey. We are not encrypting with this key, instead we are deriving keys.
Names keep us more honest when there isn't a type system to resist misuse. As I said earlier, never use the same key in different cryptographic operations!
Last tweaks
Hold on to your tail because there is more to fix!
There is another problem here, and it is called canonicalization!
What if the table were named "customers" and a column "last_order_id", while another table is called "customers_last_order" with a column named "id". And both were written to in the year 2023. Then both will have the info set to "customers_last_order_id_2023"!
This problem also applies to the prior version with the salt receiving the label.
A more robust way is to ensure that the info is a constant length no matter what. An easy solution is to use a hash! However, this will have a performance penalty.
The code above will produce constant length info values until the year 10000.
What if I steal the key? You can't change the year!
You got me! Key management is a hard problem and it is out of scope for this article.
Now, there is no way that info will look the same for different purposes, as long as the key is not compromised. Therefore, the keys will always be unique between different purposes!
We expect that hash functions never have collisions. Mathematically, this is impossible. For practical security, it is the best we can manage.
Once a collision has been found by accident, it is considered weakened. Once a collision can be intentionally found, it is broken. See shattered (archived).
One more thing
The year is a big red flag to me for ignoring the key period. I'd be more comfortable with extended-nonce encryption, but I suspect that my preferences are incompatible with the API.
Here's a sketch of how to do this with HKDF:
In the box that gets encrypted, I'd tuck another nonce inside. This nonce would be added to the label in the info value when HKDF is called. And this nonce is not overlapping, shared with, or derived from the nonce used for encryption.
Technically it changes it from every column having a set of encryption keys to every single field having its own encryption key, insofar as the encrypting primitive is concerned.
I digress. If they want to automate away key periods, they could have a feedback system that collects metrics for each encrypted field being written over time, create a prediction horizon for its period, and issue a new primary key at the period with a similar construction above without saving the nonce into the fields. A key ID should already be present on the record and is sufficient to identify the key once it is generated.
Honestly this sounds like something that would get patented, but should not.
Conclusion
Cryptography is hard. There is a lot to consider when you use an existing well-studied construction and not all the information is clear. Much of it requires literally days or weeks of effort to understand the academic writings. It is okay to review what NIST wrote about a cryptographic tool. In fact, their documents are far more accessible than the academic publications out there. Unfortunately, NIST does not incorporate much of the newly contributed cryptography out there, so if you want to learn about the cool things you can do with Blake3, NIST will not satisfy your curiosity. There will be considerations and consequences that are not obvious, even to educated & experienced security engineers.
Making security easy is hard. AnonCo believes that making the tools developers already use secure by default will result in better security for their users and customers. And I agree! I have seen product developers avoid the unfamiliar path to deliver changes, improvements, and bugs to the product. Security needs to be built into their tools to enable them to deliver quickly, not to hold them back.
I do not have the full context of AnonCo's source code, but this misuse suggests that AnonCo needs to hire a cryptographer. There is likely more to find inside AnonCo.
Small Addendum
Database cryptography is hard. The above sketch is not complete and does not address several threats! This article is quite long, so I will not be sharing the fixes.
Be aware of the following:
Invisible Salamanders: ciphertexts exist that can be decrypted successfully with authenticated encryption schemes with distinctly different keys. This demonstrates lack of key commitment.
Confused Deputy: an attacker swaps data around or presents a ciphertext intended for another party to an authorized decryptor called the "deputy." The deputy is confused and reveals the plaintext to the attacker. This demonstrates insufficient authentication.
And certainly more... Again, hire a cryptographer.