Exploring the impact of PQC on Cryptography Key Management- 33 min read - Text Only
Sofía shares a short presentation on Post-Quantum Cryptography's (PQC) development. PQC is special and different in how it uses complex problems with no efficient quantum or classical solutions to satisfy security goals. The panel commences on several topics and a few prompts from the audience. The competition should provide multiple solutions for exchanging keys and digital signatures so that when one solution is no longer secure, applications can change to another. The largest concern is how the performance characteristics will affect applications that need key exchange and digital signatures. Google will be testing key exchange at scale, but there is a gap for digital signatures. Cryptographic agility gets redefined with an emphasis on updating applications and hard-to-reach hardware like TPMs and satellites.
This talk summary is part of my DEF CON 31 series. The talks this year have sufficient depth to be shared independently and are separated for easier consumption.
This presentation was different from other talks. It was more like a panel, except the first speaker shared a presentation remotely.
Sofía Celi began the panel with a short presentation where she shared the ongoing progress of Post Quantum Cryptography (PCQ) research. We expect that quantum computers will have the capability to break algorithms like RSA and elliptic curves in the future. Researchers are proposing cryptographic algorithms that are resistant to theoretical quantum computer strengths.
The NIST competition
The NIST PQC competition exists to procure and standardize algorithms and parameters that can be trusted to secure the U.S. government after quantum capabilities break the algorithms in use today. The NIST competition started nearly seven years ago. Independent researchers contributed candidates to the competition, and other researchers found weaknesses and vulnerabilities that removed candidates from the competition.
Another goal of the competition is to find suitable algorithms for applications that currently depend on algorithms weak to quantum computers. Applications need to establish secret symmetric key material between two parties over an unsecured channel, and sign and verify signatures on arbitrary data, such as operating system updates or TLS certificates.
When NIST concludes the competition, they will standardize primary candidates for these application requirements. The candidates will be evaluated by several qualities and the least resource intensive solution will be selected as the primary candidate. They will also prepare alternate candidates, which are founded upon different computationally hard problems. Should a weakness be discovered in the primary candidate, an alternate candidate remains strong and a migration can begin immediately.
Since attending this presentation, NIST has published draft specifications for several candidates:
- FIPS 203: Module-Lattice-Based Key-Encapsulation Mechanism Standard
- FIPS 204: Module-Lattice-Based Digital Signature Standard
- FIPS 205: Stateless Hash-Based Digital Signature Standard
The NIST competition isn't everything. New ideas and approaches have developed outside the competition too. If this subject interests you, check out BAT: Small and Fast KEM over NTRU Lattices and Improved Key Pair Generation for Falcon, BAT and Hawk.
In addition to standardizing specific algorithms, NIST will also specify standard parameters for each algorithm to achieve a certain security strengths.
TLS 1.3 uses elliptic curves to establish keys between a server and client. In a quantum-capable future, elliptic curve algorithms will be replaced to maintain the security strength that today's technology affords.
Unfortunately, the post quantum solutions to establish keys have different design requirements and constraints. For example, CRYSTALS-Kyber-512 has a public key made of
800 bytes. Applications on UDP will need to redesign how they establish secure sessions.
Bolts and screws can bind two things together. Screws thread through one thing into the second thing. Bolts go through two things and are tightened with a nut on the other side.
Cryptographic primitives can also be like that. Similar goals, like establishing a mutual symmetric key, can be achieved with different techniques. Each technique has its own requirements and applicable guarantees.
For screws, while the threads hold and there is little shear between materials, the screw will keep the materials together.
For bolts, it must pass through and have a nut tightened to experience tensile strain; it guarantees that while there is little shear between materials it passes through, then the bolt will keep the materials together at the point of contact.
Bolts require tensile stress to be effective, while screws and the bonding material can only handle so much tensile stress. Both screws and bolts fail at shear stress. If there is too much stress, screw threads will strip and the screw or the hole is no longer usable. When used correctly, both can hold materials together.
While these bolts and screws will last for years — possibly decades — in facilities and infrastructure, we are aware that some time in the future — we do not know when — it will be feasible to break these bolts and screws with theoretical tools called drills and angle grinders.
Elliptic curves and CRYSTALS-Kyber are built upon different computationally complex problems. Elliptic curves use the discrete log problem for its cryptographic strength. Shor's algorithm, which could run on a quantum computer, can efficiently solve the discrete log problem, rendering elliptic curves insecure for cryptographic use. CRYSTALS-Kyber uses the learning with errors problem, which has no known efficient quantum or classical solution.
All NIST PQC candidates have no known efficient quantum or classical solutions to their particular computationally complex problems. At the end of the competition, alternate candidates will be proposed with different computationally complex problems to ensure we have robust alternatives when new weaknesses are identified.
The NIST competition aims to standardize post quantum cryptography solutions for:
- Public Key Encryption (PKE) / Key Encapsulation Mechanisms (KEMs)
- Signature Algorithms
The submitted solutions are built upon different computationally different problems, which have no known efficient quantum or classical solutions. The different problems being used are:
- Lattices — the leading category which NIST and Google are examining
- Isogenies — Deirdre Connolly's favorite
- Zero Knowledge Proofs
These computationally complex problems are under continual study and review. Some of them are newer than others. Isogenies, for example, are relatively new compared to lattices. This newness is also a risk; the isogeny-based SIKE algorithm was broken — five years after it was submitted to the competition! The break was not with a quantum solution, but a classical solution which took advantage of some performance optimizations that SIKE did.
The specific nature of the computationally complex problem is not important for you, me, product users, or purse holders to know. Instead, the important performance characteristics for us as implementers to know are:
- Computation cost
- Key size
- Transmission size
- Memory size
- Communication requirements
The post-quantum cryptography algorithms and parameters recommended by NIST will balance several of these performance characteristics while maintaining security levels comparable to AES-128, SHA-256, and so on. Every algorithm and parameter set will have different performance characteristics when used in applications, if the algorithm or parameters are changed, the applications will need to be updated.
In five or ten years, we should discover faster implementations of the algorithms being tested and standardized now. We may also find out that the security level realized with the chosen parameters is higher than necessary. The parameters may be tuned for better performance in the future too, though this would require application updates too.
While we know that quantum solutions exist for the problems underlying RSA and elliptic curves, we assume these problems, including the performance optimized problems, are difficult until research proves otherwise — which is what happened to SIKE.
NIST is mitigating the risk of future unknown solutions by recommending multiple solutions for each use case. Should the leader be found unsafe to use years from now, an alternate implementation could take over and be resilient to the solution found that affected the broken implementation. A migration to an alternate solution will require applications to update.
Given that future algorithms have an unknown lifetime, the industry regularly discusses cryptographic agility. Some perceive cryptographic agility as being ready to swap one implementation with another without reassessing the protocol as a whole. This leads to algorithm confusion attacks, which give cryptographic agility a bad name.
The panelists do discuss their definition of cryptographic agility in the Q&A, I mostly agreed. Here's my take:
To be cryptographically agile is to be in the position to readily and rapidly change and deploy software where its security qualities have changed or will soon change, while retaining or improving its effective security quality over time as our understanding of implementations and computational capabilities change.
A PQC experiment at scale
Back to the talk by Sofía! She shares that Chrome will begin protecting traffic with a hybrid Kyber KEM. This experiment will educate PQC implementors about the real-world performance characteristics with an abundance of real-world data collected with traffic served through Google, Cloudflare, and other third party operators.
PQC means that the software has to be replaced. When quantum computers are accessible to nation-state actors, businesses, or anyone with enough cloud credits, modern communications will no longer offer the confidentiality we expect. Further, compromised communications may reveal sensitive data, such as bearer tokens. A well resourced threat with stored communications may later impersonate the people or devices that participated in those communications. Ideally, software and firmware is updated to use post-quantum cryptography before quantum capabilities are realized.
Rotation and key management will be vital. The hardest operational problem in cryptography is key management and key distribution. PQC Key management will become more difficult with PQC, as the key material is significantly larger. Rotation is part of healthy key management program
I think PQC key management is competent classical key management.
I'm worried about abysmal key management practices.
I do think that PQC increases the need for robust agility and measurement in your key management program.
For example, how long will it take for you to move from liboqs123 to liboqs234? How long will it take for you to switch from transport protocol 1.2 implementation to transport protocol 2.0 implementation, how long will it take for you to rotate all your keys next time there is a keygen problem or potential key exposure? What are the cpu and network implications of rolling out algorithm2 vs whatever you're using now and can you operationally take that hit? How long will that change take for you to make and what’s the break budget and cloud cost implications, etc.
I say that because algorithms are young, the example I gave in the panel I believe was SIDH. Which failed a day or two after being selected as a candidate. Someone just read some old text and said whoaaa! Who knows how long that these algorithms and implementations will stand.
We have also had a Cambrian explosion of crypto algorithms which is great but comes with the tax of needing to be more more operationally agile if you run this stuff now.
My intent was to stress the importance of competent key management which includes rotation and management of crypto periods as we get ready for the PQC world.
Solutions will have to be redesigned as the components have different properties. PQC digital signatures and key encapsulation cannot be naively swapped with classical solutions. They have different requirements like key size, communication lengths, and even state management.
PQC keys are so big that they have to be hosted outside blockchains. For example, P-256 public keys and signatures are
64 bytes. Dilithium2's public key size is
1312 bytes, and the signatures are
2420 bytes. That is a factor of
37.8 times respectively! Blockchains are very space constrained. The same storage and communication optimizations, like public key recovery on ECDSA signatures (archived), are not available.
TLS and certificate transparency may be affected too. Certificates with PQC keys and signatures will be much larger. This may sound like a broken record, but the size difference cannot be understated. Imagine if the certificate transparency log for my website (archived) were 10 times larger. Now track and record that for
1,106,880,336 certificates every 90 days for years.
It took 25 years for elliptic curve cryptography to get adopted in the industry. Ed25519 took 11 years to go from a research paper to being featured in Safari. We expect post-quantum cryptography adoption will take a similar amount of time. That is why the PQC competition started in 2016 and real world tests are starting in 2023. We may see much wider adoption of PQC by 2040.
We have a lot of room for efficiency gains with the PQC algorithms we are experimenting with today. If ECDSA and RSA are anything to go by, we should expect that new perspectives on the underlying math will be discovered, which enable equivalent functionality with other, more efficient, techniques. For example, ECDSA verification had a 40% speed up.
Ephemeral ECDH is cost effective, KEM agreements are not as cost effective. Communication protocols like TLS use ephemeral Elliptic-curve Diffie–Hellman for forward secrecy. At the cost of more computation and communication, PQC KEMs can be used for forward secrecy instead.
FALCON uses floating point numbers. An expert says that it is difficult to implement without side channels. It is the fastest and most compact of its competitors. Embedded devices may lack FPUs which are needed for FALCON. See the FALCON website and New Efficient, Constant-Time Implementations of Falcon.
There is concern for how much computational and network headroom we will lose by moving to PQC algorithms. The cryptography we use today is lightweight enough that its computational and memory overhead is negligible to applications in the cloud and personal computing devices. Given that PQC requires far more resources, applications that rely on many short-lived connections may have less resources to work with in the cloud.
Constrained devices are restricted in storage and connectivity, how we will address that use case is not yet clear. Once more, the performance characteristics are a concern. Constrained devices have less resources to work with. We may see the internet of things rely on cryptography that requires less code, less memory, and less network communication. Candidates for constrained devices have not been realized.
PQC key sizes will limit how many keys a security key can hold. Security keys are constrained devices. Today's Yubikey can only store 25 P-256 keys. Its internal storage may be limited to 2000 bytes. We will need more resources on security keys to support the future with PQC.
NFC may be too restrictive in power and physical proximity time to execute PQC algorithms and frustrate individuals. NFC devices, like hotel keys, credit cards, and Amiibos get their power over near contact with a magnetic field. There is not much power to go around. Power efficiency is already top of mind with little overhead to share for cryptography.
If we need to change algorithms, then individuals will have to update device firmware. As an industry we have been bad at updating firmware. The evergreen browsers have shown continual updates can work. Internet-connected vehicles are starting to update regularly, too. But, that requires a very careful engineering team with a focus on continual quality and planning for the worst: a bad update. See HP Races to Fix Faulty Firmware Updates that Bricked Printers.
Satellites and TPMs will need upgradable firmware for crypto-agility. Devices that are hard to reach will need to be capable of new algorithms. We do not know if or when the algorithms we're exploring today will break. Replacing these devices will not be possible, but upgrading them remotely might be. That too comes with some risk, however — refer to the firmware concern directly above.
Micro-architecture side channel attacks would be the same for classical and PQC algorithms. Exfiltrating secret key material is possible with micro-architecture attacks, whether it is AES, RSA, or Kyber. A major defense is to not share the hardware in the first place. Another is to use constant-time code.
Hardware isolation is hard and something we will have to think about at scale. Sharing the same hardware has been a continual thorn, even with micro-code updates and kernel mitigations. It is clear that disabling HyperThreading is the way forward for cloud providers.
SIMD instructions are often used for attacks, Rust uses these to be fast. Same Instruction Multiple Data (SIMD) instructions operate on a larger set of data with less CPU time and are particularly effective at mathematical operations. Data that is processed by a SIMD instruction tends to be stored in SIMD-specific registers. Cryptographic code can get a significant speed up and appear constant-time with SIMD instructions. However, we find that other processes or interrupts can find secret information used in SIMD registers with fine-grained timing, power analysis, or outright buggy CPU microcode.
The standards are already out the door, and we are stuck with them. The competition started in 2016, but NIST only recently requested more candidates for digital signatures. Alternatives to NIST PQC standards, like P-256 vs Ed25519, may not appear for a while, and will take a long time to gain trust. By then, we may say that the NIST standards are good enough and trusted enough that the risks of using something else is too high for production use.
Focus has been on encryption and privacy with TLS, but not authentication. Indeed, Google Chrome will be testing Kyber, a Key Encapsulation Mechanism for forward privacy, and not Dilithium2. In fact, they cannot until Certificate Authorities (CAs) participate with dual signing certificates. They are not incentivized to increase their risk by signing with untested algorithms — yet how will we solidify the level of risk until it is tested at scale?
The internet is built on other tech too. "Maybe we can come up with something better than DNSSEC." Such as CAs, CDNs, TLS middleware, and JWTs.
MLS has been worked on several years and it is built with PQC changes in mind for the future, rather than shoving it in at the end. Messaging Layer Security has been under standards development since 2018. It has been developed while PQC prototypes have been available to observe the performance characteristics.
Today we should try to make platforms that do their intended job well. X.509 was not made for the web. Indeed, X.509 was borrowed for signing public keys for websites, hardware drivers, and so on. It was originally made for authenticating things in "The Directory".
IPv4 has been "good enough", advancing cryptography has more urgency and is measurable in benefits and risks than migrating to IPv6. NAT, CGNAT, and IPv6 to IPv4 bridges have muddied the urgency for leaving IPv4 behind. There are few incentives for supporting IPv6 fully. PQC has a different quality of urgency. We have researchers, government institutions, and businesses concerned and interested in its development. Further, researchers have a compelling story for "what if we do nothing, and quantum computers come out?": We'd have broken confidentiality, integrity, and authenticity for 15-20 years.
The future world may have more possibilities outside NIST. After all, X25519 isn't blessed by NIST and it is part of TLS 1.3. It will take time before compelling and safe NIST alternatives come out.
Having lots of algorithms is nice because we can switch quickly to alternatives when something breaks and has problems. The SIKE story is a good example of why alternatives are needed.
There are optimizations like signing things in batch. Batch verification is important in blockchain contexts.
Convincing Certificate Authorities to adopt alternatives is hard. Certificate Authorities will not act or increase their risk unless those that define roots of trust, like Microsoft, Google, Apple, and Mozilla, compel them.
What about hash-based algorithms? They're still valid, but it's hard to track state. State is required for many hash-based algorithms, as reusing the hash can break the security of the system, much like signing two things with the same nonce in ECDSA with the same key (archived).
Audience member: What if blockchain?
Deirdre: What if not blockchain? We don't need blockchain for some problems.
Overall, it was a delightful panel and a few blurry details were cleared up for me. I caught up with the recent developments in the NIST competition and formed a better understanding of the PQC landscape while preparing this article. This experience provided a lot of grounded detail, as opposed to the hopes, dreams, fear, uncertainty, and doubt that quantum interest groups rattle off.