Exploring the impact of PQC on Cryptography Key Management

- 33 min read - Text Only

Sofía shares a short presentation on Post-Quantum Cryptography's (PQC) development. PQC is special and different in how it uses complex problems with no efficient quantum or classical solutions to satisfy security goals. The panel commences on several topics and a few prompts from the audience. The competition should provide multiple solutions for exchanging keys and digital signatures so that when one solution is no longer secure, applications can change to another. The largest concern is how the performance characteristics will affect applications that need key exchange and digital signatures. Google will be testing key exchange at scale, but there is a gap for digital signatures. Cryptographic agility gets redefined with an emphasis on updating applications and hard-to-reach hardware like TPMs and satellites.

This talk summary is part of my DEF CON 31 series. The talks this year have sufficient depth to be shared independently and are separated for easier consumption.

This presentation was different from other talks. It was more like a panel, except the first speaker shared a presentation remotely.

Title slide for The state of PQC

Sofía Celi began the panel with a short presentation where she shared the ongoing progress of Post Quantum Cryptography (PCQ) research. We expect that quantum computers will have the capability to break algorithms like RSA and elliptic curves in the future. Researchers are proposing cryptographic algorithms that are resistant to theoretical quantum computer strengths.

The NIST competition

The NIST PQC competition exists to procure and standardize algorithms and parameters that can be trusted to secure the U.S. government after quantum capabilities break the algorithms in use today. The NIST competition started nearly seven years ago. Independent researchers contributed candidates to the competition, and other researchers found weaknesses and vulnerabilities that removed candidates from the competition.

Another goal of the competition is to find suitable algorithms for applications that currently depend on algorithms weak to quantum computers. Applications need to establish secret symmetric key material between two parties over an unsecured channel, and sign and verify signatures on arbitrary data, such as operating system updates or TLS certificates.

When NIST concludes the competition, they will standardize primary candidates for these application requirements. The candidates will be evaluated by several qualities and the least resource intensive solution will be selected as the primary candidate. They will also prepare alternate candidates, which are founded upon different computationally hard problems. Should a weakness be discovered in the primary candidate, an alternate candidate remains strong and a migration can begin immediately.

Recent news update

Since attending this presentation, NIST has published draft specifications for several candidates:

ML-KEM parameter sets as specified by NIST. There are parameters with strengths listed for 128, 192, and 256 bits.

The NIST competition isn't everything. New ideas and approaches have developed outside the competition too. If this subject interests you, check out BAT: Small and Fast KEM over NTRU Lattices and Improved Key Pair Generation for Falcon, BAT and Hawk.

In addition to standardizing specific algorithms, NIST will also specify standard parameters for each algorithm to achieve a certain security strengths.

PQC process / competition. NIST called for submissions of quantum resistant algorithms for public key cryptography. Level 1: exhaustive key search. Level 2: collision search. Alternates between AES and SHA in the list as a hardness reference.

None of these words are in the bible!
Bear with me.

TLS 1.3 uses elliptic curves to establish keys between a server and client. In a quantum-capable future, elliptic curve algorithms will be replaced to maintain the security strength that today's technology affords.

Unfortunately, the post quantum solutions to establish keys have different design requirements and constraints. For example, CRYSTALS-Kyber-512 has a public key made of 800 bytes. Applications on UDP will need to redesign how they establish secure sessions.

A crude analogy

Bolts and screws can bind two things together. Screws thread through one thing into the second thing. Bolts go through two things and are tightened with a nut on the other side.

Cryptographic primitives can also be like that. Similar goals, like establishing a mutual symmetric key, can be achieved with different techniques. Each technique has its own requirements and applicable guarantees.

For screws, while the threads hold and there is little shear between materials, the screw will keep the materials together.

For bolts, it must pass through and have a nut tightened to experience tensile strain; it guarantees that while there is little shear between materials it passes through, then the bolt will keep the materials together at the point of contact.

Bolts require tensile stress to be effective, while screws and the bonding material can only handle so much tensile stress. Both screws and bolts fail at shear stress. If there is too much stress, screw threads will strip and the screw or the hole is no longer usable. When used correctly, both can hold materials together.

While these bolts and screws will last for years — possibly decades — in facilities and infrastructure, we are aware that some time in the future — we do not know when — it will be feasible to break these bolts and screws with theoretical tools called drills and angle grinders.

Elliptic curves and CRYSTALS-Kyber are built upon different computationally complex problems. Elliptic curves use the discrete log problem for its cryptographic strength. Shor's algorithm, which could run on a quantum computer, can efficiently solve the discrete log problem, rendering elliptic curves insecure for cryptographic use. CRYSTALS-Kyber uses the learning with errors problem, which has no known efficient quantum or classical solution.

All NIST PQC candidates have no known efficient quantum or classical solutions to their particular computationally complex problems. At the end of the competition, alternate candidates will be proposed with different computationally complex problems to ensure we have robust alternatives when new weaknesses are identified.

Competition overview.

The NIST competition aims to standardize post quantum cryptography solutions for:

  • Public Key Encryption (PKE) / Key Encapsulation Mechanisms (KEMs)
  • Signature Algorithms

The submitted solutions are built upon different computationally different problems, which have no known efficient quantum or classical solutions. The different problems being used are:

  • Lattices — the leading category which NIST and Google are examining
  • Codes
  • Hashes
  • Symmetric
  • Multivariate
  • Isogenies — Deirdre Connolly's favorite
  • Zero Knowledge Proofs

These computationally complex problems are under continual study and review. Some of them are newer than others. Isogenies, for example, are relatively new compared to lattices. This newness is also a risk; the isogeny-based SIKE algorithm was brokenfive years after it was submitted to the competition! The break was not with a quantum solution, but a classical solution which took advantage of some performance optimizations that SIKE did.

The specific nature of the computationally complex problem is not important for you, me, product users, or purse holders to know. Instead, the important performance characteristics for us as implementers to know are:

  • Computation cost
  • Key size
  • Transmission size
  • Memory size
  • Communication requirements

The post-quantum cryptography algorithms and parameters recommended by NIST will balance several of these performance characteristics while maintaining security levels comparable to AES-128, SHA-256, and so on. Every algorithm and parameter set will have different performance characteristics when used in applications, if the algorithm or parameters are changed, the applications will need to be updated.

In five or ten years, we should discover faster implementations of the algorithms being tested and standardized now. We may also find out that the security level realized with the chosen parameters is higher than necessary. The parameters may be tuned for better performance in the future too, though this would require application updates too.

While we know that quantum solutions exist for the problems underlying RSA and elliptic curves, we assume these problems, including the performance optimized problems, are difficult until research proves otherwise — which is what happened to SIKE.

NIST is mitigating the risk of future unknown solutions by recommending multiple solutions for each use case. Should the leader be found unsafe to use years from now, an alternate implementation could take over and be resilient to the solution found that affected the broken implementation. A migration to an alternate solution will require applications to update.

Given that future algorithms have an unknown lifetime, the industry regularly discusses cryptographic agility. Some perceive cryptographic agility as being ready to swap one implementation with another without reassessing the protocol as a whole. This leads to algorithm confusion attacks, which give cryptographic agility a bad name.

The panelists do discuss their definition of cryptographic agility in the Q&A, I mostly agreed. Here's my take:

To be cryptographically agile is to be in the position to readily and rapidly change and deploy software where its security qualities have changed or will soon change, while retaining or improving its effective security quality over time as our understanding of implementations and computational capabilities change.

Did that help?
It does clear things up, but I'm kinda worried about cryptographically agile software in general. It seems that protocols with cryptographic agility end up having downgrade attacks a lot more than I'd consider comfortable to the point where I think that cryptographic agility is kind of a smell. I guess this is what we'd need to avoid reinventing the wheel though. Everything sucks here.

A PQC experiment at scale

Back to the talk by Sofía! She shares that Chrome will begin protecting traffic with a hybrid Kyber KEM. This experiment will educate PQC implementors about the real-world performance characteristics with an abundance of real-world data collected with traffic served through Google, Cloudflare, and other third party operators.

Updates on migration. Chrome supporting post quantum cryptography. Link to a blog post on chromium dot org.

Panel discussion

After Sofía's presentation, Deirdre Connolly, James Howe, Mark Carney, Ryan Hurst, and Sandra Guasch Castello begin a freeform panel.

PQC means that the software has to be replaced. When quantum computers are accessible to nation-state actors, businesses, or anyone with enough cloud credits, modern communications will no longer offer the confidentiality we expect. Further, compromised communications may reveal sensitive data, such as bearer tokens. A well resourced threat with stored communications may later impersonate the people or devices that participated in those communications. Ideally, software and firmware is updated to use post-quantum cryptography before quantum capabilities are realized.

Rotation and key management will be vital. The hardest operational problem in cryptography is key management and key distribution. PQC Key management will become more difficult with PQC, as the key material is significantly larger. Rotation is part of healthy key management program

I think PQC key management is competent classical key management.

I'm worried about abysmal key management practices.

I do think that PQC increases the need for robust agility and measurement in your key management program.

For example, how long will it take for you to move from liboqs123 to liboqs234? How long will it take for you to switch from transport protocol 1.2 implementation to transport protocol 2.0 implementation, how long will it take for you to rotate all your keys next time there is a keygen problem or potential key exposure? What are the cpu and network implications of rolling out algorithm2 vs whatever you're using now and can you operationally take that hit? How long will that change take for you to make and what’s the break budget and cloud cost implications, etc.

I say that because algorithms are young, the example I gave in the panel I believe was SIDH. Which failed a day or two after being selected as a candidate. Someone just read some old text and said whoaaa! Who knows how long that these algorithms and implementations will stand.

We have also had a Cambrian explosion of crypto algorithms which is great but comes with the tax of needing to be more more operationally agile if you run this stuff now.

My intent was to stress the importance of competent key management which includes rotation and management of crypto periods as we get ready for the PQC world.

- Ryan Hurst

Solutions will have to be redesigned as the components have different properties. PQC digital signatures and key encapsulation cannot be naively swapped with classical solutions. They have different requirements like key size, communication lengths, and even state management.

PQC keys are so big that they have to be hosted outside blockchains. For example, P-256 public keys and signatures are 64 bytes. Dilithium2's public key size is 1312 bytes, and the signatures are 2420 bytes. That is a factor of 20.5 and 37.8 times respectively! Blockchains are very space constrained. The same storage and communication optimizations, like public key recovery on ECDSA signatures (archived), are not available.

TLS and certificate transparency may be affected too. Certificates with PQC keys and signatures will be much larger. This may sound like a broken record, but the size difference cannot be understated. Imagine if the certificate transparency log for my website (archived) were 10 times larger. Now track and record that for 1,106,880,336 certificates every 90 days for years.

It took 25 years for elliptic curve cryptography to get adopted in the industry. Ed25519 took 11 years to go from a research paper to being featured in Safari. We expect post-quantum cryptography adoption will take a similar amount of time. That is why the PQC competition started in 2016 and real world tests are starting in 2023. We may see much wider adoption of PQC by 2040.

Huh, I might have my house paid off by then.

We have a lot of room for efficiency gains with the PQC algorithms we are experimenting with today. If ECDSA and RSA are anything to go by, we should expect that new perspectives on the underlying math will be discovered, which enable equivalent functionality with other, more efficient, techniques. For example, ECDSA verification had a 40% speed up.

Parameter sizes may be reduced in the future as we find out they are stronger than necessary. I recommend reading Too Much Crypto by Jean-Philippe Aumasson.

Ephemeral ECDH is cost effective, KEM agreements are not as cost effective. Communication protocols like TLS use ephemeral Elliptic-curve Diffie–Hellman for forward secrecy. At the cost of more computation and communication, PQC KEMs can be used for forward secrecy instead.

FALCON uses floating point numbers. An expert says that it is difficult to implement without side channels. It is the fastest and most compact of its competitors. Embedded devices may lack FPUs which are needed for FALCON. See the FALCON website and New Efficient, Constant-Time Implementations of Falcon.

There is concern for how much computational and network headroom we will lose by moving to PQC algorithms. The cryptography we use today is lightweight enough that its computational and memory overhead is negligible to applications in the cloud and personal computing devices. Given that PQC requires far more resources, applications that rely on many short-lived connections may have less resources to work with in the cloud.

Constrained devices are restricted in storage and connectivity, how we will address that use case is not yet clear. Once more, the performance characteristics are a concern. Constrained devices have less resources to work with. We may see the internet of things rely on cryptography that requires less code, less memory, and less network communication. Candidates for constrained devices have not been realized.

PQC key sizes will limit how many keys a security key can hold. Security keys are constrained devices. Today's Yubikey can only store 25 P-256 keys. Its internal storage may be limited to 2000 bytes. We will need more resources on security keys to support the future with PQC.

NFC may be too restrictive in power and physical proximity time to execute PQC algorithms and frustrate individuals. NFC devices, like hotel keys, credit cards, and Amiibos get their power over near contact with a magnetic field. There is not much power to go around. Power efficiency is already top of mind with little overhead to share for cryptography.

I am particularly concerned about cryptography that runs over NFC. The power requirements and calculation time may be frustrating as it takes several seconds to execute on a tiny grain of rice. The convenience of tap to pay may be in jeopardy.

If we need to change algorithms, then individuals will have to update device firmware. As an industry we have been bad at updating firmware. The evergreen browsers have shown continual updates can work. Internet-connected vehicles are starting to update regularly, too. But, that requires a very careful engineering team with a focus on continual quality and planning for the worst: a bad update. See HP Races to Fix Faulty Firmware Updates that Bricked Printers.

Satellites and TPMs will need upgradable firmware for crypto-agility. Devices that are hard to reach will need to be capable of new algorithms. We do not know if or when the algorithms we're exploring today will break. Replacing these devices will not be possible, but upgrading them remotely might be. That too comes with some risk, however — refer to the firmware concern directly above.

Micro-architecture side channel attacks would be the same for classical and PQC algorithms. Exfiltrating secret key material is possible with micro-architecture attacks, whether it is AES, RSA, or Kyber. A major defense is to not share the hardware in the first place. Another is to use constant-time code.

Hardware isolation is hard and something we will have to think about at scale. Sharing the same hardware has been a continual thorn, even with micro-code updates and kernel mitigations. It is clear that disabling HyperThreading is the way forward for cloud providers.

SIMD instructions are often used for attacks, Rust uses these to be fast. Same Instruction Multiple Data (SIMD) instructions operate on a larger set of data with less CPU time and are particularly effective at mathematical operations. Data that is processed by a SIMD instruction tends to be stored in SIMD-specific registers. Cryptographic code can get a significant speed up and appear constant-time with SIMD instructions. However, we find that other processes or interrupts can find secret information used in SIMD registers with fine-grained timing, power analysis, or outright buggy CPU microcode.

Language compilers automatically vectorize simple computations over lots of data using SIMD instructions. Some projects, such as simdjson, take it to the next level by using intrinsic functions — which compile to SIMD processor instructions — to process gigabytes of structured data per second.
Speculative execution, which optimizes process execution on modern CPUs, can trick processors with context switching to reveal unrelated data stored in its registers. Attacks like Zenbleed (archived) directly leak register data between processes.
Any process that uses SIMD to compute against secret key material is at risk of its secrets being stolen by a malicious process on the same vulnerable processor. For that reason, cloud vendors no longer share processor cores between customers.
Amazon Web Services

Amazon Web Services

AWS has a conservative approach to EC2 tenant-isolation, ..., that is designed so that customer instances can never share system resources such as L1/L2 cache or threads running on the same CPU complex. This fundamental design choice rules out the possibility of data leakage from customer instances through microarchitectural side-channels which are predicated upon shared access to these resources among tenants. – The EC2 approach to preventing side-channels
This does not mitigate malicious processes on the same virtual machine exfiltrating data from other processes. Hardware isolation is hard.

The standards are already out the door, and we are stuck with them. The competition started in 2016, but NIST only recently requested more candidates for digital signatures. Alternatives to NIST PQC standards, like P-256 vs Ed25519, may not appear for a while, and will take a long time to gain trust. By then, we may say that the NIST standards are good enough and trusted enough that the risks of using something else is too high for production use.

Focus has been on encryption and privacy with TLS, but not authentication. Indeed, Google Chrome will be testing Kyber, a Key Encapsulation Mechanism for forward privacy, and not Dilithium2. In fact, they cannot until Certificate Authorities (CAs) participate with dual signing certificates. They are not incentivized to increase their risk by signing with untested algorithms — yet how will we solidify the level of risk until it is tested at scale?

The internet is built on other tech too. "Maybe we can come up with something better than DNSSEC." Such as CAs, CDNs, TLS middleware, and JWTs.

MLS has been worked on several years and it is built with PQC changes in mind for the future, rather than shoving it in at the end. Messaging Layer Security has been under standards development since 2018. It has been developed while PQC prototypes have been available to observe the performance characteristics.

Today we should try to make platforms that do their intended job well. X.509 was not made for the web. Indeed, X.509 was borrowed for signing public keys for websites, hardware drivers, and so on. It was originally made for authenticating things in "The Directory".

IPv4 has been "good enough", advancing cryptography has more urgency and is measurable in benefits and risks than migrating to IPv6. NAT, CGNAT, and IPv6 to IPv4 bridges have muddied the urgency for leaving IPv4 behind. There are few incentives for supporting IPv6 fully. PQC has a different quality of urgency. We have researchers, government institutions, and businesses concerned and interested in its development. Further, researchers have a compelling story for "what if we do nothing, and quantum computers come out?": We'd have broken confidentiality, integrity, and authenticity for 15-20 years.

The future world may have more possibilities outside NIST. After all, X25519 isn't blessed by NIST and it is part of TLS 1.3. It will take time before compelling and safe NIST alternatives come out.

Having lots of algorithms is nice because we can switch quickly to alternatives when something breaks and has problems. The SIKE story is a good example of why alternatives are needed.

There are optimizations like signing things in batch. Batch verification is important in blockchain contexts.

Convincing Certificate Authorities to adopt alternatives is hard. Certificate Authorities will not act or increase their risk unless those that define roots of trust, like Microsoft, Google, Apple, and Mozilla, compel them.

What about hash-based algorithms? They're still valid, but it's hard to track state. State is required for many hash-based algorithms, as reusing the hash can break the security of the system, much like signing two things with the same nonce in ECDSA with the same key (archived).

Audience member: What if blockchain?

Deirdre: What if not blockchain? We don't need blockchain for some problems.

Final thoughts

Overall, it was a delightful panel and a few blurry details were cleared up for me. I caught up with the recent developments in the NIST competition and formed a better understanding of the PQC landscape while preparing this article. This experience provided a lot of grounded detail, as opposed to the hopes, dreams, fear, uncertainty, and doubt that quantum interest groups rattle off.


Thank you to Soatok, Ibzan, and Sofía Celi for review. Thank you to Ryan Hurst for answering questions weeks after the panel. Thank you to Xe Iaso for contributing to this article.