Overview of Ceramic Core Technology: Why can dynamic storage better unlock data value?
Author: Wayne, Dongxuan, Linda
Source: BixinVentures
From knotting with stone carvings to bamboo slips and paper, human civilization has spent most of its time solving the problem of data storage media. Subsequently, technologies such as floppy disks, hard drives, and cloud server storage further addressed issues of capacity, security, speed, and efficiency. The rise of decentralized storage solutions like Filecoin and Arweave has brought opportunities for data ownership—individuals, organizations, institutions, and enterprises will truly own their data rather than being monopolized by giants, which embodies the spirit of Web3.
However, once ownership is established, a natural question arises: how do owners exercise this right and obtain greater value? We summarize these two questions as the "data utility" issue.
As a decentralized storage network for composable data, Ceramic plays a key role in the evolution from "rights" to "utility." This is an important reason why we have been continuously monitoring and researching the development of the Ceramic network and its ecosystem and application construction.
Compared to a few months ago when it was relatively quiet, Ceramic is now launching numerous developer projects and has gained significant developer adoption at various hackathons such as Eth Denver, HackFS, DecentralHacks, and DAO Global Hack. According to data disclosed by Ceramic, as of the end of March, there are over 50 projects built on Ceramic, covering multiple fields including NFTs, DAOs, and GameFi. Additionally, at the beginning of March, Ceramic secured a $30 million investment led by Multicoin and USV.
There are already many articles discussing Ceramic's potential in data composability, so we will not elaborate on that here. This article aims to detail the technical possibilities of Ceramic in enhancing data utility from the perspectives of dynamic data optimization, flexible consensus mechanisms, network security, and privacy. We will also provide key information regarding network nodes and application development. In our view, the three key values established by Ceramic are:
● Decentralized storage horizontal scaling solution
● Community-driven data model marketplace
● Open API network sharing resources
1. Dynamic Data Storage Optimization
Decentralized storage keeps data on nodes rather than centralized servers, allowing data usage rights to be more in the hands of data creators rather than centralized servers. Due to the continuous updates of personal behavior and the iterative use of data, there is a large amount of dynamic data (data that changes over time in system applications) in personal data. However, IPFS, Sia, and even Arweave currently mainly target static data (data that is not frequently called or updated) for storage, which has limited processing capabilities for dynamic data and incurs higher costs.
Taking IPFS as an example, it achieves content-based retrieval rather than address-based retrieval by generating a unique hash value for each user. When a user needs to retrieve a file, they can ask IPFS who has that hash to complete the retrieval. However, there are several obvious issues with using hash values to tag data:
- Each update requires network verification and the generation of a related hash value, and repeatedly modifying the hash value increases the time spent on storage;
- Verifying the hash value during retrieval may not be faster than common database calls—because when there are distant nodes, the transmission speed may actually be slower;
- After converting the hash value to data, it often takes additional time to combine, which does not significantly affect the speed and price of static data storage and retrieval, but for dynamic data, repeatedly performing this operation not only impacts the speed and capacity of the storage network but also incurs high costs.
Ceramic optimizes dynamic data storage through the "Streams control mechanism," significantly reducing both storage time and costs. It is important to note that this optimization is not an independent implementation separate from decentralized storage technologies like IPFS but is built on top of IPFS.
The "Streams control mechanism" refers to the mechanism that utilizes personal identity accounts to aggregate the data generated by individual actions into data streams for control.
Similar to how we can often view the version at a specific time when using shared documents, each Stream has its own fixed StreamID (similar to creating a shared document). Each time dynamic data is modified, it does not directly change the hash value or update the StreamID but instead updates the log. The log records the parameters of events related to the Stream in the Ceramic network, with each modification representing a specific version. Thus, for frequently updated data, only the log value needs to be modified after updating the content, avoiding the need to repeatedly generate hash values, thereby reducing storage consumption time.
Diagram of Streams mechanism
Since dynamic data itself tends to decrease in usage or transform into static data over time, and users also generate static data, these static data or infrequently updated dynamic data can ultimately be entrusted to decentralized storage solutions like IPFS or Arweave, or traditional cloud service providers like AWS.
Moreover, due to the "StreamID," data composability becomes easier. If different programs adopt different data formats, it will inevitably lead to incompatibility during calls, preventing many data from being combined and also resulting in insufficient access permissions during calls. For example, when measuring personal credit, it may require contribution data from a DAO and borrowing data from DeFi. Even if there are methods to call this data, it cannot be directly made into credit data.
In Ceramic, applications connect to the decentralized data network through APIs to store, modify, and retrieve data. All data existing on the network can be easily reused or repurposed in other applications.
Integration with Other Decentralized Storage Solutions
Ceramic has set up a built-in caching mechanism for short-term storage submissions to improve storage speed. Whenever a Ceramic node writes to or queries a Stream, all submissions of that Stream are first synchronized from the network and automatically loaded into the node's memory cache. This leads to the most popular Streams being copied the most, providing a certain degree of data persistence and availability. However, to conserve disk space and node resources, the memory cache is limited to 500 Streams by default (but can be configured to any number; theoretically, fewer numbers will make it faster, while more will increase usage).
Once this number is reached, the oldest Stream will be evicted from the node's cache to make room for newer Streams.
If a node happens to shut down or restart, the cache will be cleared. Certain dynamic data that is not frequently used or does not adopt additional data persistence measures (such as being stored on a local hard drive or in the cloud) will be permanently lost if not sufficiently replicated among other nodes.
Therefore, relying solely on caching is not a reliable source of data availability. Ceramic's solution is to bind nodes to a series of storage protocols like IPFS and Arweave to prevent data loss.
2. Flexible Consensus Mechanism
As mentioned earlier, the data generated by individual actions forms Streams, and Streams confirm user identity and events through the immutable identifier StreamID, allowing the continuously updated logs of the same Stream to be aggregated. An important technical mechanism in this process is "StreamType."
During the update process, consensus operations are conducted through the mandatory StreamType of each Stream. Ceramic defines and handles the data structure of Streams through StreamType, which includes everything that can be stored in its submissions, state transition functions, authentication requirements, and conflict resolution strategies.
It is worth noting that when storing data, Ceramic nodes will only validate Streams related to themselves and will not process or validate unrelated Streams, significantly reducing the workload on nodes and laying a good foundation for high-speed operation and network scalability.
Ceramic itself has predefined StreamTypes for developers to choose from, and developers can also choose to code their own based on documentation. Currently, there are mainly two different types of StreamTypes:
- Tile Document
Tile Documents are often used for identity metadata (profiles, social graphs, reputation scores, linked social accounts, etc.), user-generated content (blog posts, social media, etc.), data collections formed by indexing other StreamIDs, database replacements for user tables, DID files, verifiable claims, and more. When nodes update data using the Tile Document StreamType, valid updates can only occur when the storing user's DID is signed, ensuring data security.
- CAIP-10 Link
CAIP-10 Link is a StreamType that links blockchain addresses to DIDs with cryptographic verifiable proof. A DID can have an unlimited number of CAIP-10 links, binding it to many different addresses across various blockchain networks, ensuring that all relevant data can be aggregated under the same DID identity.
In addition, developers can also customize StreamTypes and deploy them to their own Ceramic nodes.
Consensus Conflicts
Consensus conflicts refer to scenarios where updates in Streams may occur simultaneously across different devices or programs, leading to questions of which update came first.
When updating data, StreamType calls the json-patch method to describe specific changes to the JSON document, such as additions, deletions, modifications, and queries, and then submits the generated content. Submissions constitute individual IPFS records that make up the Stream and may contain one or more submissions. With continuous submissions, Ceramic is also continuously updating and will update the StreamState after the updates are complete.
During the submission process, the Stream controller allows updates to the Stream by creating new signed submissions. At this point, two conflicting logs may appear within the same Stream. For example, if a user is playing games on two different chain gaming platforms, where one game wins a monster battle and the other game levels up the character simultaneously, or if a user operates simultaneously on different devices like a phone and a computer, it can lead to conflicts in Streams verification.
Most StreamTypes rely on the "Earliest Anchor Wins" strategy to resolve conflicts between Stream logs. Nodes periodically anchor StreamIDs and others to the blockchain (currently Ethereum). This immutable publication proof is used to obtain a trustless timestamp for when updates occurred. The conflict resolution solution for Stream logs is that the earlier anchored branch wins. If one branch is anchored while another is not, the anchored branch is preferred.
Currently, the Ceramic team anchors nodes twice daily. When unanchored, the "Longest update chain" consensus method can be used for verification, retaining the log that is longer in case of conflicts.
This ensures that the most active history with the most updates is preserved. If there are conflicting unanchored branches of the same length, the system will randomly choose one as the winning log to ensure all nodes reach consensus on the same log. This may lead to data loss in rare cases when write operations conflict within seconds. This indicates that Ceramic is currently not well-suited for applications that rely on allowing multiple end-users to update a single Stream simultaneously.
According to official documentation, as long as the update interval exceeds approximately 30 seconds, there should be sufficient time to share updates across the entire Ceramic network and prevent such conflicts.
In the future, StreamTypes will be able to handle different consensus mechanisms for simultaneous updates. For example, Ceramic is currently researching CRDT technology to achieve processing and conflict resolution for different consensus mechanisms, so conflict issues are likely to be resolved later.
3. Security and Privacy Design
The Ceramic Protocol not only specifies the "Streams control mechanism" of the Ceramic network but also has dedicated designs for security and privacy.
- Security
The security attributes of the protocol are constructed through encrypted signatures, publication proofs (blockchain anchoring), and hash-linked data (storing data ultimately as hash values), which together allow for the construction of verifiable data structures. Additionally, Ceramic relies on libp2p (a peer-to-peer communication method within the IPFS stack) to disseminate information about Streams updates, facilitating data verification and gossip searching within the network.
The main security consideration for Ceramic nodes is that any new hints for a Stream may be false or invalid, thus requiring verification. Ceramic has its own solutions and verification methods for security issues such as DoS attacks, false log attacks, and CAIP10Link clock synchronization.
DoS attack: refers to malicious nodes sending a large number of messages to spam the network. Ceramic will address this issue by using an automatic reputation system to limit the number of messages a single node can send and disconnecting from spam nodes. However, the current Ceramic does not yet use a reputation system, but it will be gradually implemented in the future.
False log attack: malicious nodes can spam nodes tied to specific Streams by sending incorrect submission logs. Ceramic's first method is to refuse spam by stopping nodes from accepting hints from proven unreliable nodes. The second method is to construct a StreamType containing a recursive zero-knowledge proof to prove that the logs are indeed correctly associated with the given StreamID, rather than randomly sent by unreliable nodes.
CAIP10Link clock synchronization: refers to multiple repeated binding addresses of CAIP 10 Link or pointing addresses to any previously linked DID. To address this issue, in addition to avoiding duplication through local system time, Ceramic has proposed a solution using random numbers or pointers.
Although some of the above methods are still part of the plan, it is evident that Ceramic is prepared for false data.
- Privacy
Because the API is open, there may be situations where privacy is directly called during data retrieval. To mitigate this, Ceramic categorizes data into Confidential streams and Private streams. The former encrypts the content of each update of the Stream using symmetric keys.
Whenever Ceramic nodes synchronize Streams, they can only read the Stream content if they possess the symmetric key for that Stream. The metadata of the Stream, such as which DID signed the update, in what order, and when it was anchored, remains public. However, Private Streams use the Textile ThreadsDB method to allow only certain nodes to read the metadata and use the Stream without seeing the specific content, while other nodes cannot see any content within the Stream.
Example of Ceramic metadata source
Through the design of security and privacy, Ceramic regulates the use of data within the network while sharing network resources. For data calls without user permission, programs can only retrieve metadata from the API and cannot directly access all user data, thus protecting user data rights and enhancing data security.
4. Key Values
● Decentralized storage horizontal scaling solution
Ceramic is built on a scalable decentralized data network that provides the most basic data storage functionality. The Ceramic network consists of a group of permissionless nodes that work together to quickly validate Streams related to themselves during data updates.
This architecture allows the system to scale horizontally almost infinitely under Ceramic's consensus mechanism. According to official examples: Account 1 - 1,000,000 is replicated across a group of Ceramic nodes, while Account 1,000,001 - 2,000,000 is replicated across another group of nodes. The Streams of the first million accounts generally do not affect the node validation process for the next million accounts unless the accounts are involved in operations or data from other nodes.
So theoretically, if needed, the network can shard down to each individual user without compromising composability. At the same time, to ensure the verifiability and composability of states between user shards, Ceramic nodes are responsible for creating a Merkle tree of anchors containing StreamID and Commit information. The Merkle root of this aggregated Merkle tree of all user transactions will be uploaded to the selected anchoring blockchain, allowing any account to verify the integrity of any other person's Streams at any time.
Ceramic is also actively developing on-chain, and the Ceramic protocol currently supports accounts and information from seven public chains.
Ceramic supports public chains
● Community-driven data model marketplace
The Ceramic data model marketplace is more specifically a marketplace for different data components. A person's name, address, and phone number combined form a data model. If you want your data to be used for package delivery, you need to use this data model, and Ceramic has made this into a component form that can be directly called when needed. The Ceramic data model marketplace is co-built by the 3Boxlab team and the community but is primarily community-driven to solve the composability of data across applications.
Ceramic allows any developer to easily define, share, and reuse their models with other developers in the ecosystem. As of April 6, 2022, there are currently seven contributors in the community who have contributed seven data models.
Data Model on Github Source: Github
By adopting the same underlying data model, applications can use data in the same format for local interoperability. This makes building applications on Ceramic like browsing a data model marketplace; developers simply need to insert the selected data model into their applications, and they will automatically access all data stored in those models on the network. Developers do not need to worry about isolated users and data making it difficult to launch applications, greatly improving development efficiency.
● Open API network sharing resources
Charging for API calls is a typical business model, but Ceramic's API is currently provided free to developers, and its API is standardized and universal, allowing developers to access shared resources through the API on the storage network.
The data created by users after using a certain application is directly retrieved through that application's API when needed. Many applications in Ceramic use APIs to achieve data aggregation functionality, and user actions on the application will bind to relevant APIs. For example, after binding a Twitter account, users can call their related Twitter information. Ceramic can generate all user data at once through API calls, obtaining real and comprehensive user data.
API example source
These three services make Ceramic's decentralized storage network more open and shared, where data can achieve powerful composability whether through APIs or data models, enhancing data value and increasing data application scenarios, thus bringing more utility to the data stored on it.
5. Nodes
Ceramic node operators are responsible for hosting the Ceramic network. Ceramic clients need to connect to nodes to access the network, so application developers need to start building by launching their own nodes or connecting their clients to one of the community members' hosted nodes.
Currently, community members have built three nodes specifically for developer development and testing. However, since nodes regularly clean data, developers who want a stable and secure storage environment still need to establish their own mainnet nodes. Most of the current Ceramic nodes are self-built and independently operated by developers, with programs like GeoWeb, MetaGame, and Boardroom having obtained network access through self-built nodes. During the process of establishing nodes, the Ceramic network itself has also expanded.
Temporary nodes operated by the Ceramic community
Although using Ceramic is free, developers need to pay when binding to underlying service providers. Related storage applications on the market, such as IPFS, Arweave, or Aleph, reward node operators with tokens, while Ceramic currently does not have a token incentive program (the community has mentioned that there may be token incentives) but is seeking commercial node ideas. The specific plan has not yet been launched, which has led to a lack of developer incentives in Ceramic at present.
6. Application Examples
An original product of Ceramic, developed and operated by the original Ceramic team, IDX is a multi-platform identity protocol that utilizes Ceramic's DID infrastructure (such as Tile Document StreamType) as a decentralized alternative to centralized user tables. IDX allows users to establish a unified digital identity composed of all their data while enabling developers to break down silos and freely share user data across applications. IDX has been adopted by 900 projects, with a total of 35,000 identity registrations and 250,000 related records generated.
A Web3.0 application that allows users to earn cryptocurrency by learning and using the latest web3 applications. RabbitHole uses IDX to link multiple Web3 wallets and Web2 accounts to a unified DID. After calculating the total reputation score, they store this verifiable credential in the user's identity for use in any Web3 application.
A DAO discovery and governance platform, home to communities shaping the ownership economy. BoardRoom stores proposals, comments, votes, and other user-generated content for its governance applications. Since switching to Ceramic, they have been able to increase engagement, enhance trust in governance, and remove their backend components.
Geo Web is a set of open protocols and property systems for anchoring digital content to physical locations. GeoWeb requires a simple and trustless way to store editable NFT data that can only be updated by the current asset owner. Ceramic's Streams and NFT DID methods are well-suited to solve their problems.
A large online collaborative game. MetaGame uses Ceramic's identity protocol IDX to store profile data for Ethereum users, which can be used, added, or extended by any application in the Web3 metaverse.
In addition to these applications, there are many others in the fields of DeFi, DAO, NFT, and GameFi on the Ceramic network. They primarily solve data usage issues through data composability and an open and shared storage network, enhancing the value of data after combination, thereby improving user data utility and allowing applications themselves to gain corresponding value.
Conclusion
Ceramic's "Streams" control mechanism has further improved the storage efficiency of both dynamic and static data, while the horizontal scalability brought by this mechanism has greatly enhanced the capacity and data processing volume of Ceramic's storage network. The unification of underlying data and open APIs allows for data aggregation and usage across different applications, truly enhancing data usability and laying the foundation for improving data utility from efficiency.
However, Ceramic currently also faces some issues that should be addressed in the future development of the protocol: for example, it currently lacks incentive mechanisms for node operators and developers; the limitations of memory caching also force projects using Ceramic to run their own nodes, increasing the development burden.