IOSG Ventures: From Storage to Computing, the Revival of Decentralized Storage
Written by: Leo, IOSG Ventures
Summary:
The Arweave protocol is designed to technically ensure the realization of permanent storage, making it more suitable for the preservation of high-value digital assets, such as NFT metadata.
Beyond data storage, computation is also essential. With the introduction of smart contracts and programmability, the development of decentralized data storage networks has entered a new phase of "more than just storage."
To achieve data storage redundancy, Filecoin relies on economic incentives, while Arweave leverages protocol design.
FVM brings storage finance to Filecoin, commodifying storage space and time, allowing users to lock in costs in advance, while storage providers can also recoup funds early and plan inventory, hardware, operations based on future demand.
The mainstream directions of computer network development are data computation, transmission, and storage. As of today, the development of decentralized data storage protocols in Web3 is evident. On March 14, 2023, Filecoin officially launched the EVM-compatible Filecoin Virtual Machine (FVM) at epoch height 2,683,348 on the mainnet, bringing smart contracts and programmability to the Filecoin network, marking a new phase in the development of decentralized data storage protocols that goes beyond "just storage."
There are many decentralized data storage protocols, but the most notable are Filecoin and Arweave. In this article, we will discuss the new features granted to Filecoin by the FVM release one by one.
Perpetual Storage
Permanent storage holds special significance and demand in Web3, as high-value digital assets like NFT metadata need to be preserved indefinitely.
After the FVM release, Filecoin emphasized the feature of Permanent Storage. Our understanding of Filecoin's permanent storage is that it theoretically achieves long-term preservation through economic means, but there are not many changes at the protocol design level. In the current design of Filecoin, storage orders are matched off-chain between storage providers and storage demanders and recorded on-chain. The order contains information such as data size, storage time, order price, and collateral. If the stored data is to be continued after the storage time defined in the order expires, the storage demander must manually submit an updated order. However, after the FVM release, orders can be automatically renewed on-chain. Lighthouse is a project dedicated to achieving permanent file storage on Filecoin, where users only need to pay once for their files to be "permanently" stored. Lighthouse continuously pays for file storage through a smart contract-based endowment pool. When users create an order and pay, a portion of the amount goes to the storage provider, while the remaining part enters the endowment pool. The endowment pool's smart contract will automatically renew the order at expiration and pay using the funds in the endowment pool, thus achieving "perpetual storage." The feasibility of this design lies in the fact that the endowment pool will appreciate its assets through staking, farming, and other means, and over time, the appreciated portion will be sufficient to cover storage costs. This is similar to the assumption about storage costs in the Arweave yellow paper, which states that storage rates will continue to decline, and the appreciation of the amount paid by storage demanders will be sufficient to cover the costs of permanent storage.
"In the past 50 years, storage rates have decreased at an average annual rate of 30.57%."
Arweave Yellow Paper: The cost of saving 1GB of data for 1 hour since 1980 (log scale)
Arweave introduces a design called Blockweave on the common blockchain data structure, fundamentally achieving permanent data storage through protocol design. In Blockweave, each block on the chain (except for the latest confirmed block and the candidate block being mined) is connected to three other blocks: the previous block, the next block, and the recall block. For a block at a certain height, the recall block can be any historical block prior to that height in Blockweave. When miners mine new blocks, the selection of the recall block is randomly determined by the height and hash of the previous block. The recall block plays an important role in the consensus mechanism adopted by Arweave—Succinct Proof of Random Access (SPoRA). In Arweave, miners do not need to save all historical blocks (i.e., saving all historical blocks is not a necessary condition for participating in mining), but miners must locally save the randomly selected recall blocks to participate in the mining of candidate new blocks. The recall block functions similarly to a random check to see if miners have saved the content of a certain historical block, thus achieving the permanent storage of historical data.
Arweave's permanent storage is technically guaranteed by protocol design, making it a more robust solution compared to Filecoin's permanent storage. This is also the main reason why Web2 tech giants like Meta, Instagram, and Web3 applications like Mirror choose Arweave as their storage solution for NFTs and content.
Decentralized Computation
Data preservation is important, but usage is even more crucial. The vision of Filecoin and Arweave goes far beyond being a "decentralized cloud storage" (although most storage demanders currently use it for this purpose); they aim to create a blockchain protocol that combines low-cost storage with high-throughput computation. Beyond data storage, Web3 Dapps also require computation.
Filecoin and IPFS distribute content-addressed datasets among storage providers around the world to increase data redundancy and resilience. This decentralized data distribution brings advantages in cost, availability, and reliability, but the downside is that different parts of a single dataset are stored at geographically distant storage providers. A widely and dispersedly stored dataset is not conducive to performing computations or querying indexes on the data. However, recombining highly dispersed data into a central location for computation is expensive, wasteful, performance-degrading, and contrary to the principles of decentralized storage. The EVM-compatible FVM on Filecoin proposes a solution that combines edge computation with on-chain coordinated execution of computations. Contracts in FVM can act as intermediaries for computing resources, incentivize computation execution, distribute workloads among available storage providers, and prove the validity of computation results to earn rewards. Storage providers can register to participate in the decentralized computing network through FVM contracts. Computing clients will publish computation tasks to the contracts. The mechanisms defined by the contracts will allocate computation tasks to storage providers registered in the computing network, and upon completion of the computation, the storage providers will publish proofs to receive rewards.
Decentralized computation on Arweave is realized through the SmartWeave smart contract protocol, which has the capability to directly handle rich data. The main difference between SmartWeave and other blockchain smart contract protocols is "Lazy Evaluation," which shifts the burden of executing computations from network nodes to smart contract users. The benefits of lazy evaluation are evident; by decoupling storage and computation, nodes do not need to maintain an ever-growing global state. Smart contracts can be computed and verified for the latest state only when used by users, rather than requiring every node participating in on-chain consensus to perform computations and verifications. Delegating computation to users also enhances the scalability of the blockchain. Warp has developed a set of Warp SDK based on the initial version of SmartWeave, which improves performance and modularity compared to the native version and allows for different execution environments.
Warp recently released its 2023 roadmap, with development goals including:
1) Layer1 Synchronizer: Achieving efficient synchronization between Warp contracts and the underlying Arweave layer;
2) Layer2 Sorter: Instead of sending data directly to the Arweave mainnet (which may wait 2-3 minutes for the data to be packaged into the next newly mined block), data transactions are directed to the Warp sorter, and then through the Bundlr network, transactions can be settled immediately, providing users with instant access to data and near-instant finality;
3) Contract enhancement improvements: Warp contracts aim to provide a fully functional tech stack for Web3 Dapps to compete with Web2 services;
4) Development of Delegated Resolution Environment and Aggregation Nodes: The delegated resolution environment allows for computation delegation for highly interactive and/or insecure contracts, while aggregation nodes provide monitoring and insights into contract state information.
Storage Redundancy
Decentralized data storage networks avoid single points of failure, but how can we ensure that each node/storage provider genuinely and effectively preserves the uploaded data of storage demanders? And how can multiple nodes/storage providers separately store uploaded data to achieve storage redundancy and reliability? Filecoin and Arweave adopt different approaches; Filecoin relies on economic incentives, while Arweave leverages protocol design.
The highlights of the FVM release introduced Replication Workers and Repair Workers. Before the FVM release, if storage demanders wanted to back up their data across network nodes to maximize the chances of data being preserved in the event of a storage provider failure, they had to cumbersome match N orders with providers off-chain, execute N on-chain transactions, and intensively consume resources to transfer data N times. After the FVM release, data replication workers act as intermediaries, charging only a small fee to achieve data redundancy, saving time and costs for storage demanders. Replication workers will automatically match and generate storage orders on the Filecoin network based on the number of backups chosen by the demanders, geographical storage locations, latency requirements, price ranges, and other conditions. Repair workers can act as agents for demanders, monitoring whether stored data is lost or expired, and automatically copying and backing up data below the redundancy threshold to more storage providers based on the demanders' settings. They can also represent storage demanders to update expired or terminated storage orders.
Arweave's storage redundancy is naturally achieved through protocol design. Arweave incorporates recall blocks as part of the input for the random access Succinct Proof (SPoRA) workload algorithm, ensuring that miners who mine new blocks indeed preserve all data of the recall blocks. The SPoRA consensus mechanism encourages miners to save as many historical blocks and Blockweave data as possible within their storage capacity. However, if a miner's storage capacity is insufficient to save all historical blocks and complete Blockweave data, miners will prioritize saving blocks that have been saved by fewer other miners, as when a recall block saved by many miners is selected, more miners will compete to mine new blocks; conversely, when a recall block saved by fewer miners is selected, the competition will be relatively smaller. Additionally, since the selection of recall blocks is highly random, the probability of any recall block being selected is discretely uniformly distributed. Therefore, under limited storage capacity, rational miners should prioritize saving blocks that have been saved by fewer miners to increase their chances of mining new blocks and earning block rewards. Arweave's protocol, through clever design and economic incentives, allows Blockweave and all historical blocks to be maximally backed up within the storage capacity constraints of all miners across the network, ensuring the reliability and data redundancy of Arweave's decentralized storage network.
Data Retrieval
Data is preserved, but how to efficiently, accurately, and quickly retrieve the data is another issue.
In Filecoin, the data retrieval service is a separate economic incentive system. Retrieval Providers are responsible for providing storage demanders with quick access to their data. Retrieval providers focus on fast data access rather than long-term storage. Most storage providers are also retrieval providers. Demanders pay retrieval providers to access data. Projects like retrieval.market and Saturn Network in the Filecoin ecosystem have already implemented rapid data retrieval and content distribution.
Arweave's SPoRA consensus mechanism, in addition to the aforementioned advantages of permanent storage and storage redundancy, also improves data retrieval and access speed. Before the SPoRA upgrade, Arweave's Proof-of-Access (PoA) old consensus mechanism addressed how to incentivize miners to store as much data as possible, but it did not incentivize miners to quickly retrieve stored data. In fact, during the PoA period, miners pooled their storage resources into storage pools, which stored historical blocks. When a recall block was selected, the storage pool would send the content of that recall block to miners upon request. This was not conducive to the decentralization of the network. Statistics from the Arweave network indicated that while the overall hash power of the network was growing, the number of nodes was declining, indirectly proving the existence of storage pools. To address this issue and encourage miners to store data locally, Arweave upgraded PoA to SPoRA. After the SPoRA upgrade, miners who chose not to store historical block data locally had to request and transfer recall blocks from the storage pool (which required many requests), significantly increasing the cost and time of data transfer, while miners who stored historical block data locally had a higher chance of mining new blocks. This mechanism design effectively eliminated the existence of storage pools. The distribution of miner nodes around the world storing historical block data locally also improved the speed of data retrieval and access for storage demanders.
Financialization
With the release of FVM, numerous Web3 applications, including DeFi, can be introduced on Filecoin, such as staking protocols, insurance protocols, and storage derivatives. Storage providers in Filecoin need to pledge a certain amount of FIL as collateral to provide storage services. In the past, storage providers either raised funds to purchase FIL or relied on off-chain lending contracts to borrow FIL. However, with the establishment of staking protocols built on FVM, FIL token holders can deposit idle FIL into the protocol and set rules and terms, allowing storage providers of any scale to obtain FIL on-chain according to these rules and terms to raise sufficient collateral to provide storage services. Storage derivatives represent another interesting application scenario, as dynamic storage costs pose budgeting challenges for both storage demanders and providers. By commoditizing storage space and time, storage demanders can lock in storage costs in advance, while storage providers can recoup funds early and plan inventory, hardware, operations, and finances based on future demand.
Project Positioning and Current Status
Currently, Filecoin has a total of 3,678 nodes providing approximately 19.544 EiB of storage space, while Arweave has a total of 112 nodes that actually store 125.62 TiB of data. In terms of scale, the Filecoin network is larger, but although Filecoin and Arweave are both decentralized data storage protocols, they have different positioning and cannot simply be compared based on the number of network nodes or scale.
Protocol Labs positions Filecoin as a Storage Marketplace and Incentive layer, developing a comprehensive storage market, retrieval market, financial products, etc., around Filecoin, and achieving rich product functionalities (such as permanent storage, storage replication and repair, etc.) through economic incentive design, aiming to become the largest and most important decentralized data storage, distribution, and computation protocol. The most important positioning of Arweave is to permanently preserve data and implement computation on data based on the underlying Arweave smart contract protocol. All mechanism designs serve this primary goal, and from the introduction of the previous features, it is not difficult to appreciate that Arweave's design is intricate and unified.
Outlook
Compared to the rapid advancements in the Ethereum ecosystem and the Ethereum Virtual Machine, the development of decentralized data storage networks has appeared somewhat subdued in recent years. There are many excellent projects and entrepreneurs within the Filecoin and Arweave ecosystems, but currently, the storage solutions for Web3 Dapps have not widely adopted Filecoin and/or Arweave; many Dapps still rely on Web2 storage solutions. Performing computations on a blockchain that solves storage is a novel path; whether it is FVM or SmartWeave, both have the potential to unlock unprecedented decentralized applications for developers. As developers or users, choosing which decentralized storage protocol to use is not a binary choice but should be based on the storage needs of the application and content. While there are overlapping areas in the positioning of Filecoin and Arweave, they can further develop in their unique strengths to meet the continuously evolving needs of decentralized network storage, realizing the vision from "decentralized cloud storage" to decentralized servers.