Will Celestia become the flagship of the DA module?

2022-02-18 20:47:16

Collection

We revisit an old and often-discussed question: what is the current bottleneck for public chain scalability? Regardless of your viewpoint, the issue of data availability must be taken into account.

Source: CryptoYC Tech

What is Celestia

Celestia was formerly known as LazyLedger. It is an infrastructure specialized in "data availability." Of course, it itself is a chain, but it does not involve state computation issues. Therefore, a series of a priori questions arise, such as: what is the importance of data availability and how does it relate to scalability?

Here, we need to look at the data issues of traditional blockchains. Taking ETH as an example, currently, the vast majority of nodes are light nodes, which do not produce blocks but validate them. However, since they only validate the block headers, there is a possibility that a block producer publishes a correct and valid block header but does not include or conceals the transaction data, leading to data availability issues where light nodes can easily be deceived or accept invalid blocks.

At the same time, since full nodes cannot generate data availability proofs for light nodes, as well as fraud proofs for invalid blocks, light nodes need to verify the block data themselves. Alternatively, they must assume that the vast majority of the data is honest and trustworthy.

It is evident that for security, the vast majority of nodes must download all transaction data and verify data availability, which in turn leads to scalability issues.

At this point, we can identify two bottlenecks of data availability:

Availability Proof: Telling other nodes that the data of this block is real and available.
Fraud Proof: Whether the block is valid.

These are the two problems that Celestia aims to solve.

How It Works

So, how does Celestia solve these two problems? Simply put: it abandons on-chain execution and on-chain state transitions, ensuring data availability only through 2D Reed-Solomon erasure coding and a specialized namespace Merkle tree structure, while the execution part is left to the end users themselves. Let's take a brief look at these two components.

2D R-S Erasure Coding

In summary, this concept is about transmitting some information, not just the information itself, but also adding some fault-tolerant redundancy. For example:

If I want to convey the information 1 2 3, but I know that during the transmission process, some information may be lost or erroneous, I would choose to transmit one two three to maximize the probability that the recipient can correctly confirm the information I transmitted.
If the recipient receives the information as on, to, thee, as long as they know the content format, which in this case is numbers, they can likely confirm that the original data was 1 2 3.

Of course, this involves a threshold problem. Assuming the information has n segments, as long as k segments can be correctly transmitted, the complete information can be restored, meaning the ratio of successfully transmitted information > k/n. According to standard Reed-Solomon erasure coding, this threshold is 50%.

Specifically for Celestia, they proposed a new sampling method: 2D sampling. The information is divided into equally sized 2D shards. When verifying fraud proofs (data availability proofs), light nodes only need to download data from either the horizontal or vertical direction, thus the amount of data that needs to be downloaded directly becomes √N (assuming the data is divided into n*n shards).

Of course, since they do not download the complete data, light nodes also need to download the Merkle tree roots of each row and each column as part of the block header for verification, to ensure maximum data availability. Thus, the entire Celestia network becomes a P2P seed download network. With very little data, it can largely confirm data availability.

Namespace Merkle Tree

We all know that Ethereum's state updates are globally synchronized; each state transition updates the status of all addresses. For example, if I update the data of www.cryptoyc.com on Ethereum, the nodes not only update and verify the data of cryptoyc but also update and verify unrelated data (such as the status of www.123.com), which is clearly unreasonable.

Therefore, Celestia uses a namespace Merkle tree for data storage, allowing end nodes to only download data relevant to their applications instead of downloading all block data like Ethereum. Through this structure, corresponding functional nodes can return only the data states needed by the end applications to the end nodes.

Details about the structure and composition rules of this Merkle tree can be discussed later when interested. In summary, the structure is based on the namespace hash (nsHash, a wrapper hash with the namespace identifier as a prefix) as the basic data, with the root node containing all relevant naming data of its child nodes (the term "relevant" is very important). The stored data is in JSON format. The nsHash consists of three elements: minNs, MaxNs, and hash(x).

minNs: The minimum namespace identifier among the child nodes of its root node.
maxNs: The maximum namespace identifier among the child nodes of its root node.
hash(x): The hash value of the child node, similar to that of a regular block.

The entire structure can be represented by this example diagram:

White Paper: https://arxiv.org/pdf/1905.09274.pdf

The entire process is very similar to AR's smartwave: the chain is responsible for storing data and consensus, while execution is left to the end. There are no restrictions on the development language, and there are no execution bottlenecks. Additionally, it allows nodes to download only the data relevant to their applications while avoiding the need to download the entire block data, thus improving efficiency.

However, the difference is that Celestia separates the roles of storage and consensus again, so there are three roles in the Celestia network: Consensus nodes, Storage nodes, Client nodes.

Role Distribution and Basic Process

To better understand the role distribution and basic operation process of Celestia, we need to start from its goals.

First, Celestia aims to decouple data availability from state transitions/computations. The reason for not mentioning consensus is that consensus itself confirms data availability and authenticity, making it difficult to separate from data availability; otherwise, it wouldn't be considered a blockchain.
Second, only download the information needed for computation. Celestia hopes that the nodes executing computations will only retrieve the information they need without downloading the entire blockchain data for state transitions.
Data Integrity. The ability to detect hidden data destruction.
Sovereign Independence of Application States. Similar to the second point, executing nodes do not need to execute information from other unrelated applications unless it is a dependent application (for example, an application requires calling another payment function application).

Knowing its goals, we can look at its distribution of roles.

To ensure consensus in the blockchain, there are dedicated consensus nodes.
For data availability, there are dedicated storage nodes.
With the above two, the execution tasks are left to the end users, which are considered execution nodes.

These nodes form the entire Celestia network in a peer-to-peer mesh topology structure. It is important to note that execution nodes must connect to at least one storage node to perform state transitions for their applications.

Here, we need to mention the verification rules. Generally, they are divided into two types:

Simple Verification Rules: The typical verification method of blockchains, downloading all information M, verifying Root(M) = mRoot. Once verified as true, M and the block header h are distributed, and the M data must be stored for at least a certain time t' (the maximum network delay) to ensure that other nodes can receive the information (including other verification nodes and storage nodes).
Probabilistic Verification Rules: This is the 2D Reed-Solomon that Celestia primarily promotes. The basic principle has been mentioned above. Here’s another official example to illustrate the effect.

If the erasure coding threshold is 1/4, and a block is divided into 4096 pieces (64*64), then each node only needs to download 15 samples to have a 99% probability of confirming that the data is available (specific verification is covered in another purely mathematical paper that I haven't read yet). This means that each node only needs to download 0.4% of the original data segments to largely infer whether the data is available.

Of course, the probabilistic verification rules require that the total amount of data downloaded by all nodes must exceed this erasure coding threshold. For example, if the threshold is 50%, the total amount of data downloaded by all nodes must exceed 50% of the original data segments (non-duplicate) for the data availability to be confirmed and ultimately participate in block production.

Now that we have a basic understanding of the structure, we need to see how applications run on top of it.

How Application Nodes Work

First, as mentioned above, executing applications is done by end customers. They are not only users in this network but also execution nodes. By passing parameters, namely the corresponding hash and nid (namespace id, the special data structure used to retrieve information related to their application namespace without obtaining unrelated information), they can obtain all the information needed for their application state transitions, execute it off-chain, and upload the data to the chain for other execution nodes to access.

Of course, because Celestia does not verify execution results, there may be transactions that violate application logic. Therefore, a new function called transition will be introduced, allowing applications to call this function to return a state, which will be used to confirm the legality of the transaction.

If the transaction is illegal, the entire transaction will roll back to the original state. Only legal transactions will return state'.

Of course, this raises another question: Application Upgrades. This is actually handled similarly to smartwave: If a transaction uses logic different from the existing application, causing the transaction not to proceed, the application executed by that node will be considered a new application, so it will not affect others using the original application. In other words, there is no need for a hard fork; applications can upgrade themselves by simply changing local execution logic and registering as a new application. No hard fork means it won't affect other applications.

Another issue we need to address is: how to handle cross-application calls?

Cross-Application Calls

Imagine if we use a contract for domain name registration, we pay and then purchase the domain. Since there are many payment tool contracts available in the market, the domain registration contract we use will call a third-party payment contract to complete this business process together.

At this point, according to Celestia's approach, a problem arises: Due to the independent sovereignty of applications, the states of different applications do not interfere with each other, while our domain registration application will clearly interfere with other applications. What should we do?

Precondition Calls: Before completing contract A, contract B must be completed first (equivalent to modifying the state of contract B). At this point, contract B can set up a dedicated function to allow other contracts to call it. The above domain registration contract falls into this category. In this case, Celestia stipulates that when executing contract A, it is necessary to download not only the data required by A but also the data of B, executing B first and then A, because the execution of A depends on the state of B. The nodes executing contract B do not need to download data from A, as the execution of B does not depend on A.
Postcondition Calls: After completing contract A, it is necessary to modify contract B. For example, in a mail subscription service, after subscribing to emails, other contracts need to send emails according to the subscription service's status. In this case, contract B needs to download contract A during execution, execute contract A, and then execute contract B. Such calls should be rare in Celestia's vision, as directly modifying the state of another application contradicts the principle of independent sovereignty. This inevitably requires downloading all data.

Summary

Thus, the main body of Celestia has been introduced. We can see that it is highly similar to smartwave, or it can be said that this is another structure apart from the common blockchain structures, a completely composable structure. It only guarantees data availability, does not handle execution, and does not verify execution, allowing it to be plug-and-play like a USB drive, going wherever needed (unlike relay chains, which also handle state transitions).

Inherently cross-chain compatible, even supporting various cross-chain atomic interactions (because the special Merkle tree itself stores JSON information, with no requirements on the content). One advantage over AR is that the amount of data needed to verify validity is much less, so it still has great potential. Currently, one uncertain aspect is the token economy. It is unclear how the token economy will develop.

Of course, regarding whether Celestia + Optimint or Arweave + KYVE will ultimately prevail, it is hard to say, as both are projects I really like, highlighting a nice approach.

Finally, let’s look at a comparison chart to see how significant the improvements of Celestia are.