Vitalik: What kind of Layer 3 is meaningful?
Written by: Vitalik, “What kind of layer 3s make sense?”
Compiled by: Dong Yiming, Chain Catcher
Special thanks to Georgios Konstantopoulos, Karl Floersch, and the Starkware team for their feedback and review.
A recurring topic in the discussion of layer 2 scaling is the concept of "layer 3s." If we can build a layer 2 protocol anchored to layer 1 for the primary purpose of achieving security and increasing scalability, then can we certainly scale it further by constructing a layer 3 protocol "anchored to layer 2 for security, and adding more scalability on top of it"?
A simple version of this idea is: if you have a scheme that allows you to achieve quadratic growth, can you stack that scheme on top of itself and achieve exponential growth? Similar ideas include my 2015 scalability paper and the multi-layer scaling mentioned in the Plasma paper. Unfortunately, such a simple concept of layer 3s is not so easy to form into a viable scheme. Due to limitations in data availability, dependence on layer 1 bandwidth for urgent withdrawals, and many other issues, there is always something in the design that cannot be stacked, and can only give you a single boost in scalability.
Newer ideas around layer 3s, such as the framework proposed by Starkware, are more complex: they are not just stacking the same thing on top of itself, but also allocate different purposes for layer 2 and layer 3. If done correctly, this approach could potentially work. This article will detail what might make sense in a three-layer architecture and what might not.
Why can't you maintain scalability by stacking rollups on top of rollups?
Rollups (see my longer article here) are a scaling technology that combines different techniques to address the two main scaling bottlenecks of running a blockchain: computation and data. Computation has been addressed by "fraud proofs" or SNARKs, which rely on a small number of participants to process and verify each block, requiring others to perform only a small amount of computation to check that the proof process has been completed correctly. These schemes, especially SNARKs, can scale almost indefinitely; we can continue to create "many SNARKs of SNARKs" to reduce more computation into a single proof.
Data is different. Rollups use a series of compression techniques to reduce the amount of data that needs to be stored on-chain for transactions: a simple currency transfer reduces from about 100 bytes to about 16 bytes, and an ERC20 transfer on an EVM-compatible chain reduces from about 180 bytes to about 23 bytes, while a privacy-preserving ZK-SNARK transaction can compress from about 600 bytes to about 80 bytes. In all cases, there is about an 8x compression. However, rollups still need to make data available on-chain in a medium that ensures users can access and verify it, so that users can independently compute the state of the rollup and join as validators when existing validators are offline. Data can be compressed once, but not compressed again—if it could, there would usually be a way to incorporate the logic of the second compressor into the first compressor and achieve the same benefits by compressing once. Therefore, "rollups on top of rollups" do not actually provide significant gains in scalability, but as we will see below, this pattern can be used for other purposes.
So what is a "sane" version of layer 3?
Well, let's see what Starkware advocates in their post about layer 3s. Starkware is composed of very smart cryptographers who are rational, so if they advocate for layer 3s, their version will be much more complex than "if rollups compress data 8 times, then obviously rollups on top of rollups will compress data 64 times."
Here is a diagram from the Starkware post:
A few quotes:
The above diagram depicts an example of such an ecosystem. Its L3 includes:
A StarkNet with Validium data availability, commonly used in applications that are extremely sensitive to pricing.
Application-specific StarkNet systems customized for better application performance, for example, by adopting specified storage structures or data availability compression.
StarkEx systems (such as those serving dYdX, Sorare, Immutable, and DeversiFi) with Validium or Rollup data availability, immediately bringing proven scalability advantages to StarkNet.
Privacy StarkNet instances (also serving as L4 in this example) allow for privacy-preserving types of transactions to exist without including them in the public StarkNet.
We can compress the article into "three visions of 'L3s'":
L2 is for scaling, L3 is for custom functionality, such as privacy. In this vision, there is no attempt to provide "quadratic growth in scalability"; instead, there is a layer in the stack that helps applications scale, and then there are independent layers to meet the customization needs of different use cases.
L2 is for general scalability, L3 is for customizable scalability. Customizable scalability can take different forms: dedicated applications that compute using something other than EVM, rollups optimized for specific application data formats, etc. (including separating "data" from "proof" in each block and replacing the proof with a single SNARK).
L2 is for trustless scaling (rollups), L3 is for weakly trustless scaling (validiums). Validium is a system that uses SNARKs to verify computations but leaves data availability to trusted third parties or committees. In my view, Validium is severely underrated: in particular, many "enterprise blockchain" applications may actually be best served by validators running Validium and periodically submitting hashes to a centralized server on-chain for optimal service. Validium has a lower security level than rollups but can be much cheaper.
In my view, all three visions are fundamentally reasonable. The idea of dedicated data compression requiring its own platform may be the weakest claim—designing L2 with a general foundational layer compression scheme is very easy, and users can automatically scale using application-specific sub-compressors, but aside from that, these use cases are all reasonable. But this still leaves a big question: is a three-layer structure the right way to achieve these goals? What is the point of anchoring verification, privacy systems, and customized environments to L2 instead of just anchoring them to L1? It turns out that the answer to this question is quite complex.
Do deposits and withdrawals become cheaper and easier in a subset tree of L2?
One possible argument for the three-layer model over the two-layer model is that the three-layer model allows an entire sub-ecosystem to exist within a single rollup, which allows for cross-domain operations within that ecosystem to occur very cheaply without going through expensive L1.
But it turns out that even between two L2s or even L3s, deposits and withdrawals can be very cheap. The key here is that tokens and other assets do not have to be issued on the root chain. That is to say, you can have an ERC20 token on Arbitrum, create its wrapper on Optimism, and move back and forth between the two without any L1 transactions!
Let’s look at how such a system works. There are two smart contracts: a base contract on Arbitrum and a wrapped token contract on Optimism. To transfer from Arbitrum to Optimism, you need to send the token to the base contract, which will generate a receipt. Once Arbitrum finalizes, you can obtain a Merkle proof of that receipt and root it in L1 state, then send it to the wrapped token contract on Optimism, which verifies it and issues you a wrapped token. To move the token back, you can perform the same operation in reverse.
Although proving the deposit on Arbitrum requires a Merkle path through L1 state, Optimism only needs to read the L1 state root to process the deposit—no L1 transaction is needed. Note that since rollup data is the most scarce resource, the actual implementation of this scheme will use SNARK or KZG proofs instead of directly using Merkle proofs to save space.
This scheme has a fatal weakness compared to L1-based tokens (at least on optimistic rollups): deposits still need to wait for the fraud-proof window. If the token is rooted in L1, withdrawing from Arbitrum or Optimism to L1 requires a week of delay, but deposits are instantaneous. However, in this scheme, both deposits and withdrawals require a week of delay. That is to say, it is still unclear whether the ideal three-layer architecture on rollups is better: ensuring that the fraud-proof game occurring within a system that itself runs on a fraud-proof game is secure involves a lot of technical complexity.
Fortunately, these issues do not become a problem for ZK rollups. For security reasons, ZK rollups do not require a long waiting window of up to a week, but for other reasons, they still require a shorter window (first-generation technology may require 12 hours). First, especially more complex general ZK-EVM rollups require more time to cover the non-parallelizable computation time of blocks. Second, for economic reasons, very few proofs need to be submitted to minimize the fixed costs associated with proof transactions. Next-generation ZK-EVM technology, including dedicated hardware, will address the first issue, while better architecture for batch verification can address the second issue. What we will discuss next is the issue of optimizing and batch submitting proofs.
Rollups and validiums have a trade-off between confirmation time and fixed costs. L3 can help address this issue, but what else can do this?
The cost of rollups per transaction is cheap: it is just 16-60 bytes of data, depending on the application. However, rollups must also pay high fixed costs each time they submit a batch of transactions to the chain: optimistic rollups require 21000 L1 gas per batch, while ZK rollups exceed 400,000 gas (if you want to provide quantum-safe things only with STARKs, it requires millions of gas).
Of course, rollups can simply choose to wait until there are L2 transactions worth 10 million gas to submit an entire batch, but this would lead to very long batch intervals, forcing users to wait longer for high-security confirmations. Therefore, they need to weigh: longer batch intervals and optimal costs, or shorter batch intervals and significantly increased costs.
To give us some concrete numbers, let’s consider a ZK rollup with a batch cost of 600,000 gas and a fully optimized ERC20 transfer (23 bytes) costing 368 gas per transaction. Assume this rollup is in the early to mid-adoption phase, with a TPS of 5. We can calculate the gas per transaction against the batch interval:
If we enter a world with a lot of custom verification and specific application environments, many of those will have a throughput far below 5 TPS. Therefore, the trade-off between confirmation time and cost begins to become very important. In fact, the "L3" paradigm does solve this problem! A ZK rollup within a ZK rollup, even a simple implementation, has only about 8,000 layer-1 gas in fixed costs (500 bytes for proof). This would change the above table to:
The problem is basically solved, so are L3s good? Perhaps they are. But it is worth noting that there is another method inspired by ERC 4337 aggregation verification to solve this issue.
The strategy is as follows. Today, if each ZK rollup or validium receives a proof, the proof S ~new~ = STF(S ~old~ ,D): the new state root must be the result of correctly processing the transaction data or state increment on top of the old state root. In this new scheme, the ZK rollup will accept messages from a batch verifier contract that states it has verified a batch of statements, each of which takes the form S ~new~ = STF(S ~old~ ,D). This batch proof can be constructed through recursive SNARK schemes or Halo aggregation.
This would be an open protocol: any ZK rollup can join, and any batch verifier can aggregate proofs from any compatible ZK rollup and receive transaction fees from the aggregator. The batch verifier contract would verify a proof once and then send a message to each rollup, along with the (S ~old~ , S ~new~ , D) triple. The fact that the triple comes from the batch verifier contract would serve as evidence that the transformation is valid.
If optimized properly, the cost of each aggregation in this scheme could be close to 8000, with 5000 for adding new updated state writes, 1280 for the old and new roots, and an additional 1720 for miscellaneous data processing. Therefore, it would give us the same savings. Starkware actually already has something similar called SHARP, although it (yet) is not a permissionless open protocol.
One possible response to this approach might be: but isn’t this just another layer 3 scheme? We will have base layer \<- batch mechanism \<- rollup or validium instead of base layer \<- rollup \<- validium. From a certain philosophical architecture perspective, this may be true. However, there is an important distinction: the intermediate layer is not a complex full EVM system, but a simplified and highly specialized object, making it more likely to be secure, more likely to be built without another dedicated token, and more likely to be governed minimally and not change over time.
Conclusion: What exactly is a "Layer"?
A three-layer scaling architecture composed of stacking the same scaling scheme on top of itself typically does not work well. The form of rollups on top of rollups (where two layers of rollups use the same technology) is also unsatisfactory. However, a three-layer architecture with different purposes for L2 and L3 can work. Validiums on top of rollups do make sense, even if they cannot definitively be the best long-term way to do things.
However, once we start delving into the details of which architectures make sense, we enter philosophical questions: what is a "layer," and what is not? Base layer \<- batch mechanism \<- rollup or validium and base layer \<- rollup \<- rollup or validium are doing the same work, but in terms of how they work, the proof aggregation layer looks more like ERC-4337 than rollups. Generally, we do not refer to ERC-4337 as "layer 2." Similarly, we do not refer to Tornado Cash as "Layer 2," so if we are to be consistent, we would not refer to a privacy-centric subsystem located above L2 as L3. Therefore, there is an unresolved semantic debate about what should first be called a "Layer."
There are many possible schools of thought on this matter. My personal preference is to limit the term "Layer 2" to things that have the following properties:
Their purpose is to improve scalability.
They follow the "blockchain within a blockchain" model: they have their own transaction processing mechanism and their own internal state.
They inherit the full security of the Ethereum chain.
Thus, ideal rollups and ZK rollups are L2, but verification, proof aggregation schemes, ERC 4337, on-chain privacy systems, and Solidity are another matter. It may make sense to call some of these L3, but perhaps not all; in any case, it seems premature to determine definitions now, and the architecture of multi-rollup ecosystems is far from static, with most discussions occurring only theoretically.
That said, the linguistic debate is less important than the technical question of which structure is actually most meaningful. Clearly, certain "layers" serving non-scaling needs like privacy can play important roles and need to fill the important function of proof aggregation in some way, preferably through open protocols. But at the same time, we have ample technical reasons to keep the intermediate layer linking user environments and L1 as simple as possible; in many cases, a "glue layer" as EVM rollups may not be the right approach. I suspect that as the L2 scaling ecosystem matures, the more complex (and simpler) structures described in this article will begin to play a larger role.