How to use HKDF to derive new keys
- 40 min read - Text OnlyWe rely on several cryptographic tools constructed together to secure our lives. Many depend on hidden information with specific properties to provide the security benefits they claim. It is not always convenient to agree upon or distribute a large amount of hidden information like a One-time pad. Key Derivation Functions (KDFs) reliably create unrelated keys for different cryptographic tools from a single Input Key Material (IKM).
In fact, a KDF helped you view this article, specifically the HMAC-based Key Derivation Function (HKDF). While correctly used in your browser, it is often misused as I will show in critiquing an anonymized example I found online in a publication by AnonCo.
Just about every cryptographic tool out there can be used wrong. Deploying correct cryptography is hard, so hard that you should never do it alone.
"Cryptography is nightmare magic math that cares what kind of pen you use."
Unlike the watermelons above, AnonCo's misuse does not open them up to any new threats. It happens to meet AnonCo's functional goals more by accident than on purpose. In cryptography, accidents are dangerous and not something to joke about.
HKDF misuse
In this article, "misuse" has a specific meaning: a cryptographic tool is not delivering all intended security properties because it is not used correctly.
Here are a few ways HKDF can be misused:
- In a public setting, no salt is given.
- The salt input is not indistinguishable from random (IND).
- The salt input is used for domain separation.
- The same salt is used across multiple transactions in a public setting.
- Different salts are used in a private setting.
- The inputs to HKDFs are the same for different contexts, resulting in the same keys for different purposes.
Indistinguishable from random
Key Expansion
Planning for the future
HKDF Sub-key derivation on the fly
As a recap, here's where we are in this story:
- we have Input Key Material (IKM) with enough entropy
- we know what cryptographic operation will use this key, and its requirements (e.g. the length of the key)
- we have a unique label for what it will be used for (e.g. encrypting a certain database table and column)
- we desire keys on the fly for the operation we are about to do
- we must be able to create the same keys on the fly again for another operation at a later time
And the interface to HKDF looks something like this:
interface HKDF {
extractAndExpand(
key: Uint8Array,
salt: Uint8Array | null,
info: Uint8Array | null,
length: number
) : Uint8Array
}
First, we put in our input key material (IKM).
Second, our problem does not involve another party executing cryptography. Therefore, we do not put anything into the salt
.
Third, for info
, we put in a unique label. HKDF could be called twice with the same key, salt, and length, but have different labels like "encryption key"
and "authentication key"
.
And fourth, for length
, we provide the key size we want. In this case it is likely bytes, so for 256 bits: length = 32
.
Finally HKDF can be called and the IND output can be used for encrypting, authenticating, or some other neat thing!
Case Study: AnonCo
AnonCo's product relies on a technology that abstracts database storage and database operations. Additionally, they use a compatible security dependency which facilitates seamless encryption and decryption when it goes in and out of the database to the application.
However, that security library does not provide per-column encryption keys, which is a feature that AnonCo wants. AnonCo has a lot of customers they need to protect across many integrated products. At their scale, it is a good idea to encrypt each sensitive database field with a different key. Unfortunately, the plumbing to do this requires a product developer to add a new key correctly each time they need to add or migrate an encrypted database field!
This not only disincentives secure development, it also introduces the chance of an accidentally reusing keys or introducing weak (not IND) keys through manual process!
AnonCo tried to automate key provisioning to eliminate manual process using HKDF.
There are multiple problems with the approach AnonCo used, which I will cover!
Here's the important code, which is translated for anonymity.
export class ColumnEncrypt {
private salt: Uint8Array;
private encoder: TextEncoder;
private constructor(encoder: TextEncoder, table: string, column: string) {
this.encoder = encoder;
this.salt = encoder.encode(`${table}_${column}`);
}
async buildKey(encryptionKey: CryptoKey) : Promise<CryptoKey> {
let data = `${new Date().getFullYear()}`;
let key = await crypto.subtle.deriveKey(
{
name: 'HKDF',
salt: this.salt,
info: this.encoder.encode(data),
hash: 'SHA-256'
},
encryptionKey,
{name: 'AES-GCM', length: 256},
false,
['encrypt', 'decrypt']);
return key;
}
static async newInstance(table: string, column: string) : Promise<ColumnEncrypt> {
return new ColumnEncrypt(new TextEncoder(), table, column);
}
}
The key they're getting out will look functional, but it will not have the security properties one expects from a KDF.
First: salt
is being given the label!
Again, the salt
is meant to resist analysis of a shared secret in a public transaction.
Private key derivation
Most security engineers do not write protocols between peers, servers, or clients. They write solutions to problems within their organization. Distributing secrets is a solved problem.
Let's assume that AnonCo can distribute an IND secret IKM to their servers and that the key was correctly created.
The above transcript has minor edits for clarity.
You can generate an IND secret key easily with a command like this:
$ openssl rand -hex 32
0de81e851cd7995626ad4c3e160ae1c449af4e15c8ceabd44fb75be581adfbaa
Then, in your application, decode the hex and now you got 32 bytes or 256 bits of entropy to use as a master KDK! Assume that from now on, we will be using the binary form, not the hex form, as an Input Key Material (IKM). By definition, the hex form is not IND. If we use IND key material, our application has less computational overhead for the same level of security.
// We generated this above as an example IND key derivation key
let openSSLSecretKey = '0de81e851cd7995626ad4c3e160ae1c449af4e15c8ceabd44fb75be581adfbaa';
// Parse the hex string into a Uint8Array
let ikm = Uint8Array.from(openSSLSecretKey
.match(/.{1,2}/g)
.map((byte) => parseInt(byte, 16)));
// Import the raw key data
let kdk = await crypto.subtle.importKey(
'raw',
ikm,
'HKDF',
false, // KDF keys cannot be exported
['deriveKey', 'deriveBits']);
// We are going to create a signing key from the secret
// If we create other keys too, they should not have the
// same label!
let label = 'signing key';
// This function works with bytes.
// Therefore we must encode our label which is text to bytes.
let encoder = new TextEncoder();
let info = encoder.encode(label);
// A salt is a required property, even though it is empty.
let salt = new Uint8Array(); // Nothing inside!
// Derive a signing key from the key derivation key
let signingKey = await crypto.subtle.deriveKey(
// Again, the salt is empty
// The info will uniquely describe this key
{name: 'HKDF', salt, info, hash: 'SHA-256'},
// The input key material we decoded from hex above
// and then wrapped in a CryptoKey
kdk,
// We're creating an HMAC-SHA-256 key
{name: 'HMAC', hash: 'SHA-256'},
// We do not need to export it,
// since we can create it deterministically.
false,
// it needs to sign and verify
['sign', 'verify']);
// Prove that it works
// Let's sign "Hello world"
let message = 'Hello world';
let encodedMessage = encoder.encode(message);
let tag = await crypto.subtle.sign(
{name: 'HMAC'},
signingKey,
encodedMessage);
console.log(`Message ${message} - tag: ${btoa(tag)}`);
// Message Hello world - tag: W29iamVjdCBBcnJheUJ1ZmZlcl0=
// And prove that it can match its own mac too.
let verified = await crypto.subtle.verify(
{name: 'HMAC'},
signingKey,
tag,
encodedMessage);
console.log(`Verify?: ${verified}`);
// Verify?: true
A brief reminder of what AnonCo's source looks like:
async buildKey(encryptionKey: CryptoKey) : Promise<CryptoKey> {
let data = `${new Date().getFullYear()}`;
let key = await crypto.subtle.deriveKey(
{
name: 'HKDF',
salt: this.salt,
info: this.encoder.encode(data),
hash: 'SHA-256'
},
// ...
);
return key;
}
Inside HKDF, it is doing something like this:
// inputs
input_key_material = encryption_key
salt = "table_column"
info = "2023"
// Extract a key derivation key
// The goal of extract is to produce an IND KDK
// The input key material has enough hidden knowledge
// to be an effective input key to the extract process
key_derivation_key = HMAC(salt, input_key_material)
// Expand the KDK as needed with info
output_key = HMAC(key_derivation_key, info + "\x01")
AnonCo should have an IND KDK coming in.
Logically, transforming an IND IKM to an IND KDK of the same security level provides no benefit and only a minor performance penalty.
The extract phase is inappropriately being used for domain separation, when the security goal is only to create an IND KDK.
Then the expand phase creates a new unique key using... the year.
Uhm, a year is not unique!!! Literally, for this use case, the info
parameter must be unique.
It would be far better to have the info
set to `${table}_${column}_${year}`
!
This mistake reduces the security guarantees to PRF security.
A modified version of the code that uses HKDF correctly is:
export class ColumnEncrypt {
private label: Uint8Array;
private salt: Uint8Array;
private encoder: TextEncoder;
private constructor(encoder: TextEncoder, table: string, column: string) {
this.encoder = encoder;
this.salt = new Uint8Array();
this.label = encoder.encode(`${table}_${column}`);
}
async buildKey(keyDerivationKey: CryptoKey) : Promise<CryptoKey> {
// Logically the label above will be prefixed to the data below
let data = `_${new Date().getFullYear()}`;
let info = new Uint8Array([
...this.label,
...this.encoder.encode(data)
])
let key = await crypto.subtle.deriveKey(
{
name: 'HKDF',
salt: this.salt,
info,
hash: 'SHA-256'
},
keyDerivationKey,
{name: 'AES-GCM', length: 256},
false,
['encrypt', 'decrypt']);
return key;
}
static async newInstance(table: string, column: string) : Promise<ColumnEncrypt> {
return new ColumnEncrypt(new TextEncoder(), table, column);
}
}
Last tweaks
There is another problem here, and it is called canonicalization!
What if the table were named "customers"
and a column "last_order_id"
, while another table is called "customers_last_order"
with a column named "id"
. And both were written to in the year 2023. Then both will have the info
set to "customers_last_order_id_2023"
!
This problem also applies to the prior version with the salt
receiving the label.
Ooops! We just got a collision!
How can we fix that?
An easy and fast way is to prefix the length.
private constructor(encoder: TextEncoder, table: string, column: string) {
this.encoder = encoder;
this.salt = new Uint8Array();
this.label = encoder.encode(`${table.length}:${table}_${column.length}:${column}`);
}
A more robust way is to ensure that the info is a constant length no matter what. An easy solution is to use a hash! However, this will have a performance penalty.
export class ColumnEncrypt {
private tableHash: Uint8Array;
private columnHash: Uint8Array;
private salt: Uint8Array;
private encoder: TextEncoder;
private constructor(encoder: TextEncoder, tableHash: Uint8Array, columnHash: Uint8Array) {
this.encoder = encoder;
this.salt = new Uint8Array();
this.tableHash = tableHash;
this.columnHash = columnHash;
}
static async newInstance(table: string, column: string) : Promise<ColumnEncrypt> {
let encoder = new TextEncoder();
let tableHash = new Uint8Array(await crypto.subtle.digest(
{name: 'SHA-256'},
encoder.encode(table)
));
let columnHash = new Uint8Array(await crypto.subtle.digest(
{name: 'SHA-256'},
encoder.encode(column)
));
return new ColumnEncrypt(encoder, tableHash, columnHash);
}
async buildKey(keyDerivationKey: CryptoKey) : Promise<CryptoKey> {
let data = `${new Date().getFullYear()}`;
let info = new Uint8Array([
...this.tableHash,
...this.columnHash,
...this.encoder.encode(data)
])
let key = await crypto.subtle.deriveKey(
{
name: 'HKDF',
salt: this.salt,
info,
hash: 'SHA-256'
},
keyDerivationKey,
{name: 'AES-GCM', length: 256},
false,
['encrypt', 'decrypt']);
return key;
}
}
Now, there is no way that info
will look the same for different purposes, as long as the key is not compromised. Therefore, the keys will always be unique between different purposes!
One more thing
Conclusion
Cryptography is hard. There is a lot to consider when you use an existing well-studied construction and not all the information is clear. Much of it requires literally days or weeks of effort to understand the academic writings. It is okay to review what NIST wrote about a cryptographic tool. In fact, their documents are far more accessible than the academic publications out there. Unfortunately, NIST does not incorporate much of the newly contributed cryptography out there, so if you want to learn about the cool things you can do with Blake3, NIST will not satisfy your curiosity. There will be considerations and consequences that are not obvious, even to educated & experienced security engineers.
Making security easy is hard. AnonCo believes that making the tools developers already use secure by default will result in better security for their users and customers. And I agree! I have seen product developers avoid the unfamiliar path to deliver changes, improvements, and bugs to the product. Security needs to be built into their tools to enable them to deliver quickly, not to hold them back.
I do not have the full context of AnonCo's source code, but this misuse suggests that AnonCo needs to hire a cryptographer. There is likely more to find inside AnonCo.
Small Addendum
Database cryptography is hard. The above sketch is not complete and does not address several threats! This article is quite long, so I will not be sharing the fixes.
Be aware of the following:
- Invisible Salamanders: ciphertexts exist that can be decrypted successfully with authenticated encryption schemes with distinctly different keys. This demonstrates lack of key commitment.
- Confused Deputy: an attacker swaps data around or presents a ciphertext intended for another party to an authorized decryptor called the "deputy." The deputy is confused and reveals the plaintext to the attacker. This demonstrates insufficient authentication.
- And certainly more... Again, hire a cryptographer.