What does data availability in Layer 2 actually refer to?

ZKCross
2022-03-31 17:09:23
Collection
Data availability raises the following questions: How do nodes ensure that all the data included when creating a new block is truly broadcasted to the network? How do peer nodes in the blockchain network determine that all data related to the newly proposed block is indeed available?

Author: ZKCross

Original Title: 《Data Availability in Layer-2 Blockchains

Compiled by: Linqi, Chain Catcher

Blockchain Data Availability

The latest Ethereum sharding roadmap prioritizes data sharding over execution sharding, significantly increasing Ethereum's data throughput. Additionally, modular blockchains have delved into Rollups, volitions, validity, and data availability solutions. These are the latest discussions and trends regarding blockchain data availability.

Anyone involved in the blockchain and cryptocurrency space may be curious about what blockchain data availability means in this context.

The term "data availability in blockchain" refers to a specific problem faced by many blockchain scaling solutions. It examines how blockchain nodes generate new blocks and whether all the data contained in these new blocks is broadcasted to the network. The difficulty lies in the fact that if block producers do not publish all the data contained in a block, no one can discover whether malicious transactions are hidden within that block. To fully understand how data availability works on a blockchain, it is important to understand the composition of blocks in a blockchain and the functions of blockchain nodes.

Introduction to Blockchain Nodes

The data structure of a blockchain consists of a chain of blocks, where each block is composed of two parts: the block header and the list of transaction data.

  • The block header is the metadata of the block, containing some basic information about the block, such as the Merkle root of the transactions.
  • The transaction part of the block contains the actual transactions.

There are two types of nodes in a blockchain network: full nodes (validator nodes) and light clients. Blockchains must support data availability, especially for rollups and layer 2 chains.

Data Availability Issues

Data availability issues are part of the blockchain trilemma: security, scalability, and decentralization.

image

The trilemma refers to the widely held belief that at any given time, a decentralized network can only provide two out of the three benefits: decentralization, security, and scalability. Scalability, security, and decentralization are all interrelated. Before a transaction is finalized, the network must reach consensus on its legitimacy. If the system is large, the consensus process may take some time.

Attacks on Data Availability

A blockchain data availability attack occurs when rogue nodes broadcast the block header but withhold parts of the block that contain invalid transactions. While honest full nodes that can download and store the entire blockchain know that certain data is unavailable, they lack a formal mechanism to prove this to light nodes with limited resources that cannot access the entire blockchain data. Therefore, both sidechain and sharded blockchain strategies are susceptible to data availability attacks. Rogue nodes on a sidechain network (or shard) submit the hash of the block to a trusted blockchain without transferring the block data to other nodes in this attack.

Data availability attacks are a well-known issue in the blockchain space, initially raised in the context of blockchain light clients. Vitalik popularized it afterward. Light nodes only store block headers and validate the PoW requirements. They rely on full nodes to validate blocks and provide fraud proofs for incorrect blocks.

In the case of light nodes, data availability attacks are not fatal. Typically, miners will also operate on full nodes. If a block is unavailable, they will ignore it and mine simultaneously (i.e., on its parent block). As long as the block is inaccessible, any honest miner will not build on that block. It will ultimately fall off the longest chain. At this point, light nodes will automatically ignore such blocks. This attack becomes more critical when it involves secondary blockchains and sharding.

The key difference is that, apart from the blockchain (or shard), most honest miners/full nodes may be missing. Therefore, participants in a sidechain (or shard) lack a unified, systematic approach to decide whether to merge the missing blocks into their distributed ledger.

Data Availability Issues

The data availability issue in blockchain raises the following question: How do nodes determine that all data contained in the creation of a new block is truly broadcasted to the network? How do peer nodes in the blockchain network ascertain that all data related to a newly proposed block is indeed available?

For example, consider Bob as the operator of a ZK-Rollup (ZKR). He verifies ZK proofs on Bitcoin. Although he does not transmit all transaction data to Bitcoin, even if his proof confirms the validity of all state changes in the rollup, users of the rollup may not know their current balance. Moreover, due to the ZK nature, it cannot gain insight into existing conditions.

Mitigating Data Availability Attacks

In modular blockchains, it is crucial to ensure data availability in a separate manner due to the decoupling of validation and data availability. Rollups provide transaction proofs and compute new states off-chain but still need to publish certain transaction data on-chain to prevent data availability issues. ZK-rollups primarily publish three key data points to the main chain:

  • A cryptographic commitment of the new state (root hash)
  • Cryptographic proofs (such as ZK-SNARK) that confirm the new state is the result of applying valid transactions to the initial state.
  • A small piece of data for each transaction in the batch provided in the form of calldata.

The zero-knowledge property of cryptographic proofs confirms that transactions and state changes are legitimate but does not provide information about the transactions themselves. Full nodes no longer need transaction data to validate transactions and actively recompute the new state root. However, suppose insufficient transaction data is broadcasted to the main chain. In that case, nodes on L1 cannot detect the current state of the rollup, which is often necessary only in specific cases (but still important).

Data Availability Layer of the Blockchain

The data layer of the blockchain is used for both the data structure of the blockchain and physical storage. The blockchain ledger is designed using a list of interconnected blocks or a Merkle tree, which is encrypted using asymmetric encryption methods. The data layer consists of the following parts:

Block: A block is a data structure used to group transactions and distribute them among network nodes. Mining is the process of generating blocks. Each block has a block header, which is metadata that can verify the legitimacy of the block.

Merkle Tree and Transactions: In the blockchain, transactions are maintained as part of the Merkle Tree. The Merkle Tree summarizes all transactions in a block by generating a digital fingerprint of the complete set of transactions, allowing users to determine whether a specific transaction is included. Each leaf node contains the transaction data, while each non-leaf node contains the hash of its previous hash value. The Merkle Tree is binary and requires an odd number of leaf nodes. If there is an odd number of transactions, the final hash is duplicated to generate an even number of leaf nodes.

Network Layer: The blockchain uses a distributed network that allows anyone to download all information on the blockchain and interact with it. Peer-to-peer (P2P) networks distribute/broadcast transactions across nodes. Blockchain platforms mainly utilize three types of networks: centralized, decentralized, and distributed networks.

Physical Layer: This layer includes servers, edge nodes, and Internet of Things (IoT) devices that operate as nodes on the blockchain network. These are typically connected via a peer-to-peer network. Nodes can be any active electronic devices, such as computers, phones, or printers, as long as they are connected to the network and have an IP address.

Virtual Layer: The virtual layer sits above the hardware and is used to allocate hardware and resources to virtual machines.

Consensus Layer: This layer is responsible for implementing network rules that specify how nodes within the network should react to establish consensus on broadcast transactions.

Incentive Layer: This layer involves activities where nodes in the distributed network are incentivized for their efforts to reach consensus. Whether this layer is utilized depends on the consensus mechanism used by the blockchain protocol.

Contract Layer: Composed of services and optional components that enable the blockchain platform to integrate with other technologies, such as data feeds, smart contracts, oracles, DAOs, and state channels.

API Layer: Provides application interfaces above the blockchain and offers mechanisms for third-party applications to communicate with the ledger and smart contracts.

Application Layer: This layer features the ability to create applications on top of the blockchain.

Traditional Data Availability Solutions

Traditional data availability solutions aim to enhance scalability by altering the underlying data transmission protocols, block data structures, consensus algorithms, and incentive mechanisms of the blockchain. These solutions include Layer 1 scaling solutions such as Segregated Witness, DAG, sharding, and consensus. However, Layer 2 solutions aim to improve scalability through off-chain methods at the application layer.

Segregated Witness: A protocol enhancement deployed to prevent transaction malleability and increase block capacity.

DAG: Transactions are connected in a DAG, meaning one transaction validates the next. Each node consists of many transaction levels. When a transaction is registered in a node, it must first validate two other transactions. These two transactions are selected using a mathematical process. Then, the node must verify that the two transactions do not conflict. To validate a transaction, a node must solve a cryptographic puzzle similar to that on the Bitcoin network (PoW).

Sharding: A blockchain can be sharded into multiple chains to improve throughput. Each shard has its own block producers and can communicate to trade tokens. Sharding means that not every block producer handles every transaction but divides the network's processing power to only handle specific transactions. Therefore, non-sharded blockchains typically have one or two fully functional full nodes and one or more light clients. It is worth noting that sharding aims to distribute network resources across different nodes.

Consensus: The consensus layer is crucial for the existence of the blockchain. Any block relies on the consensus layer to function correctly. This layer is responsible for organizing the validation of blocks and ensuring their progress is consistent.

ZKRollups Solutions

Currently, there are some Ethereum Layer 2 solutions, such as optimistic rollups, ZK rollups, and Validiums. These solutions shift execution operations off-chain while ensuring the availability of application verification and data on-chain. Although off-chain execution architecture improves throughput, it is still limited by the amount of data the main chain can handle. While execution is off-chain, the verification or dispute resolution process must occur on-chain. Transaction data is submitted on Ethereum as calldata to ensure data is available for future reconstruction. This is extremely important.

For optimistic rollups, operators may submit invalid transactions and then suppress parts of the block. In this case, other full nodes in the system will not be able to verify whether the submitted claims are correct. Due to the lack of data, they will not be able to provide any fraud proofs to indicate that the claims are indeed invalid.

In the case of zero-knowledge-based roll-ups, the integrity of ZKP ensures that accepted transactions are valid. However, even with such guarantees, not revealing the data supporting transactions can have serious negative consequences. This may prevent other validators from calculating the current state of the system, excluding users from the system, and freezing their balances.

To improve throughput, it is necessary to move execution operations off-chain, but a scalable data hosting layer is also needed to ensure data availability. With ecosystems like ZKcross, reliable data hosting and sorting components are provided using zero-knowledge Rollup. The execution layer will contain several off-chain ZKRollups scaling solutions that will be merged. We are currently developing the necessary technology with ZKcross to achieve this.

On-chain Data

Blockchains generate a large amount of data from hash rates and other related activities.

What do we mean when we talk about on-chain data? In short, it refers to all data that is natively stored on the blockchain.

On-chain data refers to all transactions that have occurred on a specific blockchain network. In other words, all data that has been written into blockchain blocks. Public chain information is characterized by public access. Based on the data, there are three main classifications.

  • Specifications for each block (timestamp, gas fees, miner, block size, etc.)
  • Each transaction is very detailed (the "sender" and "receiver" addresses, the amount transferred in the transaction, etc.)
  • Calls and uses of smart contracts

On-chain data provides information about the overall health of the blockchain: network security, financial integrity, transparency, and utilization. The data at this layer is raw and detailed, requiring minimal modification. Any blockchain search engine can access it. It serves as the foundational "fact sheet" on the network, relevant to all market participants.

Off-chain Data

For Bitcoin users, off-chain and on-chain transactions each have their pros and cons. Scalability is a limitation of blockchain technology that can be addressed through off-chain solutions. The confirmation time for on-chain transactions may vary due to network congestion. Off-chain transactions are executed instantly, and transaction costs are cheaper, with certain transactions incurring no fees before being on-chain. Off-chain transactions also offer greater privacy, as transaction details are separated from the main chain and not publicly disclosed.

Some blockchain tools, particularly smart contracts, rely on blockchain-based triggers and real-world data to verify conditions and execute contracts. For example, a smart legal contract for transferring land ownership. The funds required to purchase the property are held in escrow by the smart contract, which needs to verify real-world property facts, similar to an actual deed transfer. However, if the blockchain cannot retrieve data, such as a third-party confirmed deed transfer, how can the standards of the smart contract be met to release the seller's funds? For off-chain data; this data provides real-world information to smart contracts for successfully executing contracts and transactions. As blockchain and cryptocurrency continue to evolve, off-chain solutions have the potential to become a permanent solution, but only time will tell.

With protocols like ZKCross that support cross-chain interoperability with multi-chain ZKRollups and cross-chain layer technologies, the availability of on-chain and off-chain data will be enhanced. ZKcross creates a ubiquitous layer acting as a cross-chain Layer 2, maintaining tracking and synchronization of global state changes across multiple blockchains. By validating computations and using zk-snark proofs for multi-chain rollups, it addresses the reliability issues of cross-chain third parties and the problem of data availability attacks.

Conclusion

We are on the brink of a massive paradigm shift that will permanently change the Bitcoin ecosystem. ZK-Rollups will scale Ethereum in the most efficient way. This paradigm shift will be so profound that it effectively marks the death of all other L1 smart contract chains, including Ethereum as we know it today. There are several debates about scalability. Thousands of hours of research, implementation, successes, and failures have been invested in creating ZK-Rollups.

Of course, the effort is far from over. While the road ahead is winding, the goal has never been clearer. The efficiency of the current monolithic blockchain architecture remains astonishingly low, and under the constrained trilemma, increasing throughput means sacrificing security and/or decentralization. A special modular design that separates execution into different execution layers, such as ZKrollups, has increased the efficiency of the blockchain industry by 100 to 10,000 times. There is no doubt that this is the only way to scale it to global universality and increase it to millions of TPS. Furthermore, it is expected that within a few years, over 90% of blockchain activity will occur on zkRollups without significantly impacting security or decentralization.

ChainCatcher reminds readers to view blockchain rationally, enhance risk awareness, and be cautious of various virtual token issuances and speculations. All content on this site is solely market information or related party opinions, and does not constitute any form of investment advice. If you find sensitive information in the content, please click "Report", and we will handle it promptly.
ChainCatcher Building the Web3 world with innovators