Paradigm: Zero-knowledge proofs are important but inefficient. How can we accelerate them through hardware?
Written by: Georgios Konstantopoulos, Paradigm Research Partner
Compiled by: Amber
Introduction
Zero-knowledge cryptography is one of the most remarkable innovations in computer science over the past 50 years. A series of "inherent advantages" of zero-knowledge proofs (hereinafter referred to as ZKPs) make them an essential component of various blockchain scalability and privacy solutions, including ZK rollups like StarkNet, privacy ZK rollups like Aztec, and layer one public chains like Mina, Filecoin, and Aleo, all of which utilize this technology.
Although the substantial mathematical computation demands result in slow and costly production of ZKPs, the efficiency of ZKPs is expected to significantly improve with the widespread adoption of specialized hardware represented by field-programmable gate arrays (FPGA) and application-specific integrated circuits (ASIC), potentially increasing efficiency by as much as 1000 times.
As the demand for personalized and high-performance privacy computing grows, the complexity of statements proven using ZKPs will further increase. Therefore, it will only be possible to avoid further declines in proof generation speed by using specialized hardware to generate proof results more promptly.
This means that this "industry chain" will undergo transformation. Just like miners serving the Bitcoin network, operators of specialized hardware that support the efficient operation of ZKPs will receive corresponding compensation, leading to the emergence of a complete ZK mining and proof industry. Hobbyists can generate proofs on their own CPUs or use GPUs and FPGAs. Of course, the maturity of this complete chain will require considerable time for evolution.
Why Are Zero-Knowledge Proofs Important?
Zero-knowledge proofs have two main use cases:
1. Outsourced Verifiable Computation
Suppose you have some computational needs, but due to the limitations of the platform you are using (such as your laptop, Raspberry Pi, or even Ethereum), the time cost of completing these computations is prohibitively expensive or even impossible due to insufficient computing power. In this case, you must rely on third-party services to perform the computation, which generally can quickly and affordably return the output of that computation (e.g., AWS Lambda functions or oracle services like Chainlink).
However, typically, you can only assume that the computation has been executed correctly, and if the computing power provider outputs an incorrect or invalid result, it could lead to disastrous consequences.
The value of ZKPs lies in their ability to allow third-party providers to also output a proof of computational integrity, ensuring that the output you receive is correct.
2. Privacy Computing
If you have a computational need that is not expensive to run locally, but you want to hide part of it, what can you do? For example, if I want you to know that I know the 1000th Fibonacci number without telling you the number, or if I want you to believe that I have made a payment without disclosing the amount or my identity, what can I do?
ZKPs allow you to selectively hide part or all of the input related to the computational statement.
Both of the above use cases have been reflected in many forms throughout the cryptocurrency industry.
- Layer 2 Scaling: Verifiable computation with ZKPs allows a layer one public chain to outsource transaction processing to off-chain high-performance systems (also known as layer two). This enables blockchains to scale without compromising security. For instance, StarkWare is building a scalable smart contract platform, StarkNet, which uses a special-purpose virtual machine to run ZK-friendly code. Aztec's layer two application supports privacy operations without leaking any information about user transactions.
- Privacy Public Chains: Layer one public chains like Aleo, Mina, and Zcash allow transactors to use ZKPs to hide information such as the sender, receiver, or amount, which can be default (Aleo) or opt-in (Mina and Zcash).
- Decentralized Storage: Filecoin uses ZKPs (running on GPUs) to prove that nodes in the network are correctly storing data.
- Blockchain Compression: Mina and Celo use ZKPs to compress the blockchain data required to sync to the latest state on-chain into a small proof.
Given the above, it can be said that as cryptocurrency adoption increases, the market demand for ZKPs will also grow in tandem to meet users' increasing needs for performance and privacy.
ZKPs fundamentally provide the potential for the accelerated development of scalable private payments and smart contract platforms, but their high computational costs somewhat limit the process of large-scale adoption.
Why Are ZKPs Slow, and How Can We Make Them Faster?
Proving a computation based on ZKPs first requires "translating" it from a classical description into a ZK-friendly format. This can be accomplished by manually rewriting the code to use low-level libraries like Arkworks or by compiling it into primitives using specialized languages like Cairo or Circom to generate proofs.
More expensive and complex operations will lead to longer proof generation times. Additionally, some operations that are not ZK-friendly (such as SHA) can also cause proof generation times on ordinary computers to become very long, which happens frequently.
Once your computation has been transformed into a ZK-friendly form, you can select some inputs and send them to a proof system. For example, Groth16, GM17, or more creatively named systems like PLONK, Spartan, and STARK. These proof systems accept computations expressed in ZK-friendly formats. Depending on the proof system, the proof generation process may vary, but the bottlenecks generally share common characteristics:
- Multiplication of large number vectors, particularly multi-scalar multiplication (MSM) for both base change and fixed base; or
- Fast Fourier Transform (FFT) and inverse FFT (despite the existence of non-FFT proof systems).
In systems that involve both FFT and MSM, approximately 70% of the proof generation time is spent on MSM, with the remaining time allocated to FFT calculations. Both MSM and FFT are slow, but there is potential for optimization. Let's look at the issues:
- For MSM, acceleration can be achieved through multithreading. However, even on hundreds of cores, if each element vector reaches 2 to the power of 25 (i.e., 33 million elements, which is a conservative complexity estimate for applications like zkEVM), multiplication will still take a considerable amount of time. This may lead to devices "running out of memory." In short, MSM requires a lot of memory and remains slow even in multithreaded scenarios.
- FFT largely relies on the frequent reorganization of runtime data for the algorithm. This makes it difficult to accelerate by effectively distributing the load in a computing cluster, as such computations require significant bandwidth when running on hardware. Reorganization means you need to "randomly" load and unload portions of data, for example, loading a dataset larger than 100GB on hardware with 16GB or less of memory. While operations on hardware are very fast, the time taken to load and unload data through interfaces can significantly slow down the operation.
In simple terms:
- The memory access requirements of MSM are predictable, allowing for significant parallelization, but the original computational load and memory demands are very high, keeping costs elevated.
- The memory access of FFT is random, which is not friendly to hardware and naturally makes it difficult to run on distributed infrastructure.
The most promising work we see in addressing the slowness of large MSM and FFT is PipeZK. In their paper, the authors describe a method for making MSM more efficient by using the Pippenger algorithm to skip redundant computations. They also describe a method to "unroll" FFT, allowing computations to be performed without extensive data reorganization, making memory access patterns predictable and effectively improving hardware computational efficiency.
Assuming the above methods address the fundamental bottlenecks of each algorithm, the question arises: what kind of hardware can simultaneously optimize both MSM and FFT algorithms and significantly enhance ZKP generation efficiency?
Hardware Choices
The aforementioned acceleration techniques can be implemented on various hardware technologies, including but not limited to GPUs, FPGAs, and ASICs. But which one is the best choice?
Before answering this question, we need to clarify that ZKPs are still in the early stages of development, and there has yet to be standardization in system parameters (such as FFT width or parameter data volume) or the choice of proof systems. Because of this, the two core characteristics of FPGAs make them more attractive than ASICs in the current environment.
- "Multiple Writes" vs. "Single Write": The business logic on ASICs is a single write. If any ZKP logic changes, you have to start from scratch. In contrast, FPGAs can achieve second-level reconfiguration, meaning the same hardware can be reused across multiple chains with incompatible proof systems (for example, because they want to extract MEV across chains), allowing the hardware to adapt flexibly to changes in ZK "meta."
- More Mature Supply Chain: The design, manufacturing, and deployment of ASICs typically take 12 to 18 months or even longer. In contrast, the FPGA supply chain is much more mature, with leading suppliers like Xilinx allowing large retail orders to be placed online and delivered within 16 weeks. This enables FPGA-centric businesses to have a tighter feedback loop on their products and scale their operations more flexibly by purchasing and deploying more FPGAs as needed.
As technology in machine learning and computer vision advances, the performance of FPGAs is expected to surpass that of GPUs in the future. Compared to GPUs, FPGAs also have two significant advantages:
- Hardware Cost: Top-tier FPGAs (leading process nodes, clock frequencies, power efficiency, and memory bandwidth) are about three times cheaper than top-tier GPUs. The global shortage of GPUs further exacerbates this issue.
- Power Efficiency: FPGAs have more than ten times the energy efficiency of GPUs, largely because GPUs need to connect to host devices to operate, which often consume a lot of power.
Given the above, we expect that the winners in the market will be those companies focusing on FPGAs rather than ASICs or GPUs. However, if only one or a few ZK L1 or L2 chains ultimately "monopolize" the market, and the ZK proof system stabilizes on a single implementation scheme, the likelihood of ASICs outperforming FPGAs will increase. But as it stands, even if this occurs, it may take many years.
Conclusion
In the recently concluded year of 2021, Bitcoin miners' net income exceeded $15 billion, while Ethereum miners' income surpassed $17 billion. Zero-knowledge proofs will ultimately become an important means of achieving computational integrity and privacy on the network, in which case the market size for "ZK miners" is expected to rival that of the PoW mining market.
In summary, at least in the current context, FPGA hardware can better address the inefficiencies and high costs of ZKP proof generation. In this new hardware race, FPGAs currently hold a leading position over GPUs and ASICs.