From Ethereum to Aptos, who can find the ultimate answer to the "trilemma"?

2022-08-09 17:50:38

Collection

We believe that optimization for specific applications is the future of L1. Considering the trilemma, no chain can achieve a universal design that fits all application scenarios; at this point, balancing trade-offs becomes crucial.

Original Title: “Ethereum -> Solana -> Aptos: the high-performance competition is on”

Compiled by: SevenUp DAO

Key Conclusions:

We propose a first-principles framework for L1 design trade-offs: the high-performance trilemma. (As shown in the image above)
Compared to Ethereum, Solana's radical low-redundancy design explains both its high performance and its low reliability.
Aptos, a new L1 with $200 million in all-star seed funding, is ready to challenge Solana's monopoly in the high-performance L1 space. Compared to Solana, Aptos adds more reliability at the cost of higher node hardware requirements.
We believe that optimizing for specific applications is the future of L1. Given the trilemma, no chain can achieve a universal design that fits all application scenarios. Building on our previous cross-chain article, we propose a "three questions" Q&A manual for blockchain application developers to consider their technical choices.

Projects mentioned in the article include:

Solana, Aptos, Ethereum, StarkWare, zkSync, Serum, Meteplex

Part One: The Secret of Solana's High Performance

This part includes:

Until now, Solana remains the only high-performance blockchain in a monopoly position.
Solana's design DNA is radically optimized for ideal network performance: parallel computation, reduced redundancy, and higher block production rates.

What makes Solana stand out?

As the only blockchain close to Visa's 65,000 TPS capacity, Solana has garnered support from Wall Street and Silicon Valley to attempt large-scale blockchain services.

Solana did not achieve TPS through some Turing Award magic (unlike zero-knowledge proofs, which is another important topic we will discuss). Instead, Solana made a series of design trade-offs between performance and reliability. We will discuss Solana's performance in the first part and the costs of reliability in the second part.

Design Choice 1: Parallel Computing.

The Ethereum Virtual Machine (EVM) is single-threaded—EVM can only utilize one CPU core to process transactions sequentially. As the heat generated by a single core increases exponentially with speed, physics limits the upper bound of single-core performance to be quite low.

What’s the solution? More cores! Eight 2GHz cores are much cooler than one 8GHz core, yet more powerful. In 2007, Intel introduced dual-core Pentium processors, marking the end of the single-core era. Today's computer consumers have GPUs and CPUs with 4 to 4096 cores. Making more cores work together better, rather than having a more powerful single core, has been the focus of semiconductor research for over a decade.

To achieve native multithreading, Solana had to abandon EVM compatibility. Solana's smart contracts can leverage the 4096 cores of Nvidia GPUs to run computations in parallel.

Our perspective: In this [EVM vs. Multi-thread] binary choice, we lean towards multithreading rather than EVM compatibility. We believe it is absurd for DApps in 2027 to be limited to semiconductor technology from 2007.

Some may point out the moat issue related to EVM/Solidity developers. However, developers can easily switch programming languages. Most programming languages used by today's Web 2 applications and developers are natively multithreaded. We believe future developers will feel frustrated with the mysterious single-thread architecture of EVM, just like they currently do with high GAS fees. (Additionally, we are not fans of EVM-compatible rollup solutions).

Design Choice 2: Reducing Redundancy through Deterministic Leader Rotation

Decentralization requires redundancy. In centralized cloud services like Google, computation occurs only once—because users trust that Google is correct.

In blockchain, since we cannot trust anyone, all data must be computed and verified by different nodes. The extra times a computation is performed is known as indirect costs/redundancy. To quantify redundancy, we use [Big-O notation](https://en.wikipedia.org/wiki/BigOnotation#:~:text=Big Onotation is a,a particular value or infinity.) (Big O notation, asymptotic notation), such as [O(n\^2), O(n), O(log n)], where the function indicates how complex network computation becomes as it scales to more nodes. For example, as the network grows, O(n\^3) may mean several orders of magnitude more redundancy than O(n\^2).

In Bitcoin, Ethereum, and many other simple PoS chains, the redundancy of consensus is at least O(n\^2), proportional to the square of the number of nodes: each block must be transmitted, checked, and compared with the work of every other block.

For Solana, only the designated leader node produces the next block. (See Gulf Stream, Leader Rotation. Based on this, Solana splits blocks into many small pieces, and only a small subset of node validators verifies each piece (See Turbine), rather than all nodes sending and verifying all blocks.

Solana's protocol reduces the best-case redundancy from O(n\^2) to O(log n), which is the most efficient possible in computational complexity theory. This result is indeed remarkable. Consider a (overly simplified) illustration.

Networks A and B are the same in other respects, with 100 nodes having 100k TPS. An O(n\^2) network's performance degrades 100 times for every tenfold increase in nodes. An O(log n) network's performance only degrades by about 3 times for every tenfold increase in nodes. At 100,000 nodes, the performance of the two networks would differ by 30,000 times.

This reduction in complexity also has ideological significance. In this regard, we believe Vitalik's criticism of Solana is somewhat misleading—Vitalik believes Solana is not decentralized enough due to its high hardware requirements. The $4000 hardware cost of Solana prevents "every user from running a Solana node on their own machine." This cost is indeed accurate. However, in the long run, computing costs will decrease, and Solana's design, which reduces complexity, makes it possible to have 100 times more nodes without making the network unbearably slow.

Other Design Choices:

Supporters and critics have also debated some of Solana's other technical features. We believe these features are less core, so we discuss them in summary:

3.1 Voting Transactions Counted as TPS

Some critics point out that Solana artificially inflates TPS by counting validator votes as transactions. Votes are indeed counted as transactions, but this is merely a surface issue. Perhaps Solana should clarify that its TPS is 60,000 (excluding voting transactions), rather than 65,000.

3.2 Throughput—Faster Block Times and Larger Blocks

Vitalik and StarkWare have criticized Solana's performance improvements as somewhat lazy, as Solana simply makes each block larger and block times shorter at the cost of higher hardware requirements to accommodate more transactions. Simple math tells you this is not the whole story.

Solana's maximum block size is 10MB, which is 10 times Ethereum's target size of 1MB.
Solana's block time is 0.4 seconds, which is 30 times Ethereum's 12 seconds.
The combination of the above two gives Solana approximately 300 times the lazy performance improvement compared to Ethereum.
But in reality, Solana's TPS is typically 3000 times higher than Ethereum's usual TPS. This additional 90% performance boost can be better explained by Solana's designs of parallel computation and reduced redundancy that we have discussed.

3.3 Proof of History (POH)

Solana promotes POH as its greatest innovation. In the long run, proof of history allows Solana to reduce block times to an extreme 400ms/block, even though physical network latency often exceeds 400ms. The fancy name for this feature is asynchronous consensus, with more details in Multicoin's article.

Summary of Design Choices: The Secret of Solana's High Performance

Three key metrics jointly determine the maximum throughput of a blockchain: block production rate, parallel computation, and redundancy.

Redundancy determines how much data and computation is needed in total, meaning total computation = effective computation + redundancy;
Parallel computation allows nodes to compute faster;
Block production rate determines the amount of data that can be stored in the blockchain database over a certain period.

Solana has made bold design choices in all three areas: reducing redundancy from O(n\^2) to O(log n); from 1 core to 4096 cores in parallel, and from 5MB/min to 1500MB/min in block production rate. These are the main secrets behind Solana's 65,000 TPS. In the next chapter, we will discuss the costs of these choices for Solana.

Part Two: The Costs of Solana's Choices: Prioritizing Performance over Resilience

This part includes:

Solana's radical performance optimization DNA makes it more prone to failures than other blockchains.
We propose the redundancy dilemma: given limited computational capacity, L1 must make trade-offs between performance and reliability.
The redundancy dilemma is a subset of the high-performance trilemma discussed in Part 3.

Frequent Network Incidents

In the past year, Solana has experienced at least four major network incidents. The September 2021 outage, the December 2021 degradation incident, the January 2022 degradation incident, and the April 2022 outage. Any interested stakeholders must have many questions:

What caused the incidents?

What is the essential cause? A one-time system bug? An unexpected attack? Or some issue inherent in the blockchain design DNA that we can only mitigate?

Choosing Optimal Performance over Reliability

In Part One, we discussed how Solana actively optimizes its performance in ideal conditions. "Ideal conditions" is a key term here. When things do not go perfectly according to plan, Solana can go out of control.

Design Cost 1: Radical Parallel Computing Degrades When Transactions Are Logically Ordered.

NFT minting and IEO transactions often lead to disruptions in the Solana network. The reason is: these transactions cannot be processed simultaneously on 4096 cores. When minting NFTs, it is unclear which have already been minted, leading to duplicates and bugs. All mint transactions in the same collection must be processed sequentially. A direct implication is that Solana's 65,000 TPS does not mean users can mint 6 BAYC collections in one second: due to reliance on a single GPU core, Solana's sequential processing capability may be closer to or even lower than Ethereum's, around 10 to 100 TPS.

This explains the reason for performance degradation: the uncontrolled volume of transactions during NFT minting can render Metaplex unusable, while other applications (like Serum order book) that do not rely on Metaplex can still process transactions on one of the other 4095 cores.

However, more often, performance degradation turns into network outages: pending transactions waiting for Metaplex can cause node memory overflow—when memory overflows, nodes crash and go completely offline.

Core trade-off: By using a GPU with 4096 cores instead of a 16-core CPU, Solana sacrifices single-core performance to support radical parallel computation. Generally, when transactions are unrelated, the network runs well, but once transactions exhibit undesirable patterns, Solana is more prone to crashing than Ethereum, which has higher redundancy.

Design Cost 2: Deterministic Leader Selection Becomes Ugly When the Leader Crashes

When Solana is close to crashing, the current block leader node is often the first to fail. Solana's low redundancy design heavily relies on whether the leader node is online—other nodes do not have the same transaction data or network roles as the current leader node. This means that once the leader node goes offline, the rest of the network has to do a lot of emergency work: agreeing to skip a block, reorganizing transaction data, and forwarding lost transaction data to the next leader node…

Consider the Ethereum network, which has no leader node; each node has an exact and duplicate copy containing the transaction data that will be included in a block (mempool). If any Ethereum node goes offline, all other nodes still have everything they need to produce a new block. This is the double-edged sword of redundancy: in ideal conditions, redundancy leads to a slow network; but in bad conditions, it can prevent major incidents.

Let’s illustrate with numbers. According to this paper, in the case of a leader node crash (formally known as "cascading leader failure"), Solana's emergency computational overhead can reach O(n\^4). An O(n\^2) network is slow but usable, while a network that suddenly requires O(n\^4) computation is essentially dead. This is why Solana finds it difficult to recover once it enters the O(n\^4) cascading leader failure mode.

This is a feature, not a bug.

Solana's DNA prioritizes radical optimal performance. This principle is pervasive in the architecture, making it difficult to change one aspect without affecting everything else. (We did not discuss this issue, but to illustrate interdependencies, if run on CPU instead of GPU, the core PoH algorithm would be impractical, while Solana's PoH—a data management system optimized for performance in ideal conditions—makes it difficult to implement a mempool similar to ETH). Again, this is a trade-off; you cannot have it both ways—fundamentally making Solana more stable requires creating more redundancy, thus sacrificing optimal performance in ideal conditions.

Even Solana's supporters need to be mentally prepared for network outages and performance degradation to occur many more times, as today's Solana network has yet to try all possible mitigation measures. Mitigation is an iterative game of hide and seek. One day, the hard work of Solana Labs may make 99.99% network uptime possible. However, it never means achieving 100% network uptime; today's mainnet beta is still far from 99.99%.

Part Three: Aptos Joins the Competition and the High-Performance Trilemma

This part includes:

Aptos's design choices are a compromise between reliability and performance, positioned between Solana and Ethereum.
We propose a high-performance trilemma between high performance, reliability, and efficiency.
For developers, the future trend is to optimize based on specific use cases. We propose a Q&A manual with three questions to help developers choose their infrastructure.

For over a year, Solana has remained the only name in the high-performance L1 niche. Now we have Aptos, developed by the former Libra team at Facebook and backed by a16z, Tiger, Multicoin, and FTX. Multicoin and FTX are also significant investors in Solana. Aptos recently made headlines by claiming 160,000 TPS, clearly positioning itself as a competitor to Solana.

This is also why we spend so much time dissecting Solana: it provides the best perspective to understand Aptos in practice:

Looking back at Part Two, Ethereum optimized for network uptime: Ethereum spent a lot of data redundancy preparing for the worst-case scenario, making it nearly impossible to disrupt the Ethereum network with an attack. In contrast, Solana optimized for performance in ideal conditions, spending less on redundancy, which reduces the network's reliability in extreme situations.

In addressing the redundancy dilemma, Aptos attempts to take a step back from Solana. Here are some of its key design choices:

Aptos Design Choice 1: 16-Core Server-Level CPU

This is a middle ground between Solana's 4096 GPU cores and Ethereum's 1 CPU core. In handling highly parallel tasks, Aptos may not be as fast as Solana. However, each CPU core in Aptos performs significantly better than Solana's GPU cores, so in cases of logically sequential transactions like NFT minting, Aptos may handle them better than Solana.

Aptos Design Choice 2: Ideal Case Redundancy of O(n), Worst Case Redundancy of O(n\^2)

Relative to Solana, Aptos attempts to make its network more resilient by increasing redundancy. Aptos does not aim for Solana's extreme O(log n) linear redundancy but sets redundancy at O(n). In each round of consensus, Aptos requires all non-leader nodes to synchronize additional data so that other nodes can take over if the current leader node fails. Aptos also does not attempt to split and verify blocks, as splitting would create additional workload in case of errors. The result of this design is that when the leader node does fail, Aptos's emergency handling is not as chaotic as Solana's.

Comparing: Aptos's optimal performance is not as good as Solana's, but Aptos's performance in the worst case is more acceptable—O(n\^2), while Solana's is O(n\^4). If we put these five performance metrics together, they form a nice sandwich, placing Aptos (purple) between Ethereum (blue) and Solana (green).

Aptos Design Choice 3: Extreme Hardware Requirements

You may have seen Aptos claim 160,000 TPS and wonder why I say its optimal performance is not as good as Solana's.

Note Aptos's hardware requirements: all their tests are run on AWS EC2 instances with 16-core server-level CPUs. Aptos also publicly recommends running their nodes on Google Cloud Platform rather than personal computers.

The 160,000 figure is the result of lab tests conducted on about 100 permissioned nodes—TPS will certainly be lower in more complex real-world production environments with more nodes. Aptos's internal testing also indicates that as the network scales to more nodes, its performance will approach or even fall below Solana's current 65,000 TPS.

Here’s a quick summary of the key technical specifications of Aptos, Solana, and Ethereum for reference:

Putting Everything Together: The High-Performance Trilemma

Expanding the issue to the redundancy dilemma while considering Aptos's extreme hardware requirements, we propose a version of Vitalik's blockchain scalability trilemma: the high-performance trilemma.

In this trilemma, the three characteristics that cannot be satisfied simultaneously according to first principles are as follows:

Reliability: Ensuring network uptime by spending more computation on redundancy.
Performance: Enhancing network throughput by spending less computation on redundancy.
Efficiency: The only way to improve reliability and performance is to acquire more computational resources for both.

Among Ethereum, Solana, and Aptos:

Ethereum chose network uptime and efficiency, so it spent a certain amount of computation on redundancy, resulting in slower performance.
Solana chose performance and (relative) efficiency, so it spent its limited computation on optimal performance in ideal conditions, leading to negative impacts on reliability due to lower redundancy.
Aptos chose network uptime and high performance, so to have enough computation to cover both aspects, Aptos had to choose server-based nodes, sacrificing efficiency.

Aptos's design philosophy is quite Web 2: emphasizing user-friendliness rather than decentralization. Early descriptions indicate that Aptos may integrate an advanced user account system with password recovery features. From any angle, Aptos is certainly not the most decentralized blockchain. It does not aim for ideological purity. The $200 million seed round investors from a16z and Tiger are putting some real capital and resources behind this somewhat reverse vision.

What All This Means for Investors and Developers: Optimize for Use Cases.

No Maxis.

Optimize based on your use case.

Even AWS (Amazon Web Services) offers dozens of database configurations for different use cases because there is no one-size-fits-all solution. Blockchain is a database.

Being a maxi might help profit in a rapidly growing speculative market by taking on short-term risks, but tribalism is detrimental to true value discovery and building. A good investor and builder should have a realistic attitude towards trade-offs across the board and truly understand your use case, rather than getting lost in marketing, hype, and PR speak.

Now we only have a broad outline of what the future will look like. Both Solana and Aptos will experience more errors, outages, fine-tuning, and patches. Solana will crash again, and so will Aptos. But this does not change their status as top competitors in solving the profitable high-performance L1 problem.

For Developers: At least three things to know:

Your use case: What is essential, and what is just a nice-to-have.
What are the pros and cons of the infrastructure you want to use, and what is its DNA?
The costs and benefits of mixing and matching. Cross-chain solutions and risks, as discussed in previous articles by The Anti Ape. Great DApps leverage blockchain, while poor DApps are consumed by the blockchain they use.

For investors: Aptos will launch a public testnet and token in 2022. This means Solana's monopoly in the high-performance blockchain space will soon come to an end. We expect Solana's token price to experience some selling pressure as investors have more options in the high-performance blockchain vertical. But it is still too early to declare a winner.

In any case, Aptos appears to be a strong challenger to Solana, as it attempts to balance Solana's long-term reliability with other trade-offs. But we still need to observe whether the Aptos team can execute well and whether they can challenge Solana's two-year ecosystem lead.