User data privacy protection: What is the difference between Web 2.0 and Web 3.0?
Author: Office of the Chief Economist of Wanxiang Blockchain, Wang Puyu
Reviewed by: Chief Economist of Wanxiang Blockchain, Zou Chuanwei
After experiencing Web 1.0, represented by "portal websites," and Web 2.0, represented by "social platforms," the iteration of internet business and technology has led users to gradually become accustomed to obtaining various resources and data for free online. When platforms introduce some paid products, users still tend to seek other free channels. Why does the internet market provide free products and services?
Is there really a group of enthusiastic "philanthropists" in the market providing users with free "lunches" online? Platforms attract users through free products or services, then collect a large amount of user identity data and behavioral data unrelated to the platform, and monetize user data through transfer, precise user profiling, advertising, and other channels (as shown in Figure 1). This monetization model brings platforms far greater revenue than what is obtained through a "resource subscription model."
Figure 1: User Data Circulation (Source: Author's own drawing)
Against the backdrop of sacrificing personal data for free online resources, a group of like-minded individuals has begun to resist internet platforms, including Tim Berners-Lee, the father of the World Wide Web, and Gavin Wood, co-founder of Ethereum, who proposed a new generation of the internet, Web 3.0, aimed at protecting user privacy.
The Wanxiang Blockchain Research Report No. 230, "The Prototype of Web3 Architecture and Its Middleware," details three types of Web 3.0: privacy-focused Semantic Web 3.0, public chain Web 3.0 emphasizing privacy and data ownership and control, and Web 3.0 (metaverse) described as a spatial web.
I. Introduction to Data Privacy Protection
What specifically does data privacy protection refer to? In Wanxiang Blockchain Research Report No. 173, "Data Privacy Issues from the Perspective of User Profiling," we categorize user profiling data into two types: one is user identity data; the other is behavioral data associated with user identity data, including time, location, and events.
The mainstream view in the current market is that as long as the association between user identity data and behavioral data can be severed, user privacy can be effectively protected. In this direction, two types of solutions have emerged in the market: one is user privacy protection solutions based on the development issues of Web 2.0, with main participants including mobile manufacturers and all internet platforms; the other is Web 3.0 user data privacy protection solutions that apply various distributed technologies and privacy technologies. Below, we will compare the two approaches in detail.
(1) User Data Privacy Protection in Web 2.0
During the Web 2.0 phase, pressured by regulatory scrutiny from governments around the world regarding internet data, user data privacy management gradually gained importance among internet platforms. In Wanxiang Blockchain Research Report No. 215, "2021 Industry Review: Regulatory Edition," we detailed the personal data protection laws and key points issued by the EU and China. The laws provide detailed regulations on how platforms handle personal information, personal rights, obligations, and responsibilities, clearly requiring platforms to obtain personal consent before processing personal information to protect individual data privacy.
Two types of participants—mobile manufacturers and internet platforms—have proposed two completely different user data privacy solutions from different starting points.
1. User Privacy Solutions Proposed by Mobile Manufacturers
The IMEI code (International Mobile Equipment Identity) is like an ID card for each mobile phone, with each IMEI code being unique. As shown in Figure 2, even if users do not enter their account name and password, internet platforms can associate the specific user identity through the device's IMEI code.
To prevent internet platforms from excessively collecting user behavioral data, mobile manufacturers developed OAID (Open Anonymous Device Identifier) technology. When an internet platform reads the device's IMEI code, the device provides a virtual ID to replace the IMEI code, and the ID provided each time is random, making it impossible for the internet platform to associate it with a specific user identity.
Figure 2: Application Data Collection Process (Source: Author's own drawing)
This method protects personal data privacy when users do not enter their account name and password. However, once users log into their personal accounts, they effectively disclose their identity data to the application platform, rendering the OAID technology ineffective.
Due to early mobile manufacturers' insufficient attention to data management, users would have all data access permissions enabled by default after downloading internet applications, leading to internet platforms indiscriminately collecting a large amount of user data unrelated to their business. In this context, mobile manufacturers increased strong reminders for internet platforms' data usage and allowed users to authorize data usage. Taking the use of photo album data as an example, as shown in Figure 3, Apple's photo album data authorization is divided into three modes: all photos, selected photos, and none. This authorization allows users to have more granular management capabilities over photo data, especially the "selected photos" feature, which restricts internet platforms to only collect the photos users wish to display, thus limiting the scope of data collection locally.
Figure 3: Apple Phone Personal Data Management Authorization (Source: Apple Phone Screenshot)
2. User Privacy Solutions Proposed by Platforms
In contrast to the proactive protection by mobile manufacturers, internet platforms have passively proposed user privacy protection solutions under the pressure of various data protection laws. The reason for this passivity is understandable; from the 2021 financial reports, it can be seen that advertising revenue is a very important part of the development of various internet platforms.
As shown in Figure 4, in 2021, advertising revenue accounted for over 80% of total revenue for Pinduoduo and Weibo, while Kuaishou and Baidu exceeded 50%. The core value of advertising comes from data; if internet platforms begin to actively implement user data privacy protection, it means they are willing to give up their existing business models, which is contrary to the economic model of internet platforms.
Figure 4: Advertising Revenue Proportion of Various Internet Companies in 2021 (Source: Company Financial Reports)
The user privacy protection solutions proposed by internet platforms are merely to comply with regulatory requirements. After the implementation of the Personal Information Protection Law of the People's Republic of China, as shown in Figure 5, major platforms have increased or adjusted their personal privacy settings interface and provided detailed explanations on the types of user data collected, usage scope, and methods, allowing users more choice regarding the use of their personal data.
Figure 5: Application Settings Interface of WeChat, Douyin, and Zhihu
3. Summary
The privacy protection of user data by mobile manufacturers is related to their economic model, which is to increase revenue by boosting mobile sales. The privacy and security of devices significantly impact sales, so it is not difficult to understand why mobile manufacturers are committed to developing OAID technology and various data authorization features to enhance user data privacy protection.
In contrast, the user privacy protection solutions proposed by internet platforms are merely to meet regulatory compliance requirements and are unrelated to their economic models, even contrary to them. This also explains why the solutions for user privacy by internet platforms are passive.
The current privacy protection measures only provide some explanations regarding data usage and lack proactive protective measures. Users still cannot know whether their data, once collected, is used for profiling or traded, primarily because user data is stored in the centralized servers of internet platforms, where users lack control.
II. User Identity Privacy Protection Solutions in Web 3.0
In the context of Web 2.0, the assumptions of user data privacy and security are based on trust in platforms and trust in government regulation, but users still lack control over their personal data. To address this issue, the user data privacy protection solutions proposed in the context of Web 3.0 are centered around the control of personal data.
Figure 6: Comparison of User Login Methods (Source: James Beck)
As shown in Figure 6, in the Web 1.0 phase, users logged into accounts using usernames and passwords; when entering the Web 2.0 phase, users no longer had to worry about remembering account names and passwords or repeatedly entering identity information. They only needed to click "Sign in with" or "Continue with" to authorize applications like Twitter, Google, or Facebook to use their account information.
This allows identity data to be captured or read by other applications through APIs and SDKs, completing user identity verification without entering account names and passwords. Although this method enhances user experience, it results in internet giants like Twitter and Google collecting more user behavioral data. In both Web 1.0 and Web 2.0, user data is stored on centralized servers, where users lack control.
To solve the control issue, based on the W3C's proposed Decentralized Identifier (DID), a string replaces plaintext personal identity information, and the behavioral data mapped to the string is stored on user-controlled servers. There are currently two identity privacy solutions centered around the core concept of DID: the first is a decentralized application (DApp) represented by uPort, which uses a proxy contract address to replace the identifier of identity information (as shown in Figure 7).
Other platforms can complete identity verification by recognizing the proxy contract address, thus replacing the authorization of centralized applications like Twitter and Google and gaining control over the data; the second is to generate different identifiers for different times, platforms, and purposes (as shown in Figure 8), and then store the data in identity wallets that support this protocol, thus maintaining control over personal data.
Figure 7: One-to-One Mapping of Identity Identifier and DApp (Source: Author's own drawing)
Figure 8: One-to-Many Mapping of Identity Identifier and DApp (Source: Author's own drawing)
(1) One-to-One User Privacy Solution of Identity Identifier and DApp
Taking uPort, launched by Consensys in 2017, as an example, similar to the "Sign in with Twitter" and "Continue with Facebook" features, as shown in Figure 9, uPort also provides the "Continue with uPort" feature, allowing users to authenticate their identity on other platforms and log in without a password.
Figure 9: uPort Login Authorization (Source: Application Interface Screenshot)
The difference from centralized platforms is that uPort is a user-controlled and independent decentralized identity service platform, achieving two functions at the technical level: first, users have direct control over their identity data in uPort, meaning they can decide what information to display to third-party platforms; second, identity-related data is stored on servers that users control, allowing users to decide what data to store, for how long, and who can read that data.
First is the control over data. Similar to the permission settings on Apple phones mentioned earlier, users can manage and control personal data in uPort. Unlike traditional apps, DApps are merely tools for users to connect to data storage servers. Even if uPort's DApp no longer provides services, it cannot take away users' data.
Users can still retrieve their data using mnemonic phrases or keys through other tools (such as Metamask or Imtoken, etc.), and the key point in this process is that data needs to be stored at user-controlled addresses. During the user identity verification process, as long as centralized platforms like Twitter and Facebook support uPort, they can authorize through uPort and complete identity verification in an encrypted environment. In this process, internet platforms can only obtain a proxy contract address (which is equivalent to a decentralized identity identifier) and cannot know any other specific verification information.
The earlier mentioned lack of control over personal data in Web 2.0 primarily stems from the storage of user data on centralized servers. Regarding this issue, Web 3.0 offers two solutions for data storage. Taking the creative platform Mirror as an example, the first storage method is a distributed storage solution, where files smaller than 1MB are permanently stored via Arweave.
Data is fragmented and encrypted, stored on distributed servers, and only users holding the private key on the Arweave chain can access the complete data; for larger files (such as videos or images), the second method is used, storing them on a centralized server directed by Mirror.
This method cannot guarantee data privacy and security; if Mirror stops maintaining the centralized server, all user data will be lost. It is necessary to explain why Mirror proposes two storage solutions: the reason is that all costs incurred by data storage are temporarily subsidized by Mirror. If users wish to store files larger than 1MB via Arweave, they need to bear certain costs themselves.
The above content addresses the issue of data privacy by allowing users to regain control over their data, but it is not absolute. uPort grants users data control, but to some extent, it may still not protect data privacy. Typically, each identity identifier (proxy contract address) corresponding to uPort is unique; if a large amount of data on-chain is associated with that proxy address, third-party tools (such as Whale Analysis) can still determine the user's identity, failing to truly sever the association between user identity and behavioral data. To address this issue, on public chains, Aztec's zk.money abandons the Ethereum account system in favor of a UTXO system using zero-knowledge proofs, completing ownership changes through a receipt accounting method.
This prevents third parties from tracking specific addresses. In addition to zero-knowledge proofs, Ethereum mixers like tornado.cash utilize smart contracts as black boxes during transactions to break the connection between senders and receivers, making it impossible for third parties to track specific addresses. Both zero-knowledge proofs and mixers can effectively address the issue of asset transaction data exposure caused by on-chain address exposure.
However, in practical applications within the real economy, facing vast and diverse personal identity and behavioral data, simple zero-knowledge proofs and mixers may not be highly feasible for user privacy protection, necessitating a more comprehensive identity management solution.
(2) One-to-Many User Privacy Solution of Identity Identifier and DApp
In the Wanxiang Blockchain Industry Research Report "DID: A New Identity Identification Technology," we detailed the technical principles of DID. Its core method is that different entities use different DID identifiers at different times, on different application platforms, and for different purposes to complete identity verification, while all behavioral data associated with each DID identifier will be stored at user-controlled addresses.
To explain in Web 2.0 terms, this means logging into platforms using different usernames and passwords each time, preventing the platform from associating with a specific user identity, and all data being stored on servers controlled by the client. This method can solve all the problems arising from the one-to-one user privacy solution of identity identifiers and DApps.
This includes issues such as user profiling of a unique address through large amounts of data or the leakage of a single identity identifier leading to the leakage of all data. By implementing a one-to-many mapping of identity identifiers and DApps, the isolation of behavioral data through different identity identifiers will address data privacy issues from another dimension.
This is the direction the market is striving for, but many bottlenecks still need to be resolved, including legal, commercial, and technical aspects. Among the various distributed identity identifier solutions already available in the market, no one-to-many user privacy solution has been truly developed; most remain at the conceptual stage.
III. Thoughts and Conclusion
From the comparison above, it can be seen that the biggest difference between user privacy protection in Web 2.0 and Web 3.0 lies in whether users have absolute control over their personal data. From a moral perspective, the absolute control over data promoted by Web 3.0 is very appealing to everyone, marking a milestone in personal data management, allowing individuals to take charge of their own data; however, from an economic and commercial perspective, is this absolute control over data truly valuable?
This value can be viewed from two levels: the first level is the value to users; the second is whether the value of personal data ownership aligns with economic development. First, let’s discuss the value to users. User privacy protection is more of a moral value, but from a commercial perspective, Web 3.0 lacks "philanthropists."
All data privacy protection solutions require someone to pay for them. How many users are willing to cover the development costs of data privacy platforms? And how many can bear the long-term maintenance costs of personal data (including time and financial costs)? For example, when we conduct business at banks or telecommunications companies, how many customers take the time to read the contract details carefully?
Thus, Web 3.0 requires users to spend time and financial costs to maintain personal data, which raises questions; moreover, users accustomed to using free online resources, will they be willing to pay for resources in Web 3.0?
Based on these issues, some project parties in Web 3.0 have proposed solutions, including personal data trading markets. For instance, when users regain control over their personal data, they can authorize advertisers, research institutions, and financial institutions to use the data, potentially earning benefits that were unattainable in the Web 2.0 phase.
Users can use this portion of the revenue to cover the costs of tool development and data maintenance. This solution sounds ideal, but can the business logic withstand market scrutiny? We need to question this, as the fundamental reason lies in how much market value individual data truly holds. This is a complex issue and will not be discussed further in this article.
In contrast to the above solutions, data privacy platforms are highly popular among various projects on public chains, and their economic models are very clear. Project parties develop data privacy platforms and raise funds through ICOs. As the value of digital assets rises, project parties will gain revenue and attract more people to participate in the maintenance and management of the project.
However, this model faces regulatory compliance issues in many countries and regions. If it deviates from this model, how should the economic model be considered, specifically who will pay for the development of privacy platforms? Additionally, in a digital economy society, how to break through the resistance of centralized platforms? These questions require further solutions. Besides these issues, Web 3.0 privacy protection still has the following problems to solve.
(1) How to prevent hackers from abusing privacy technology?
On February 20, 2022, hackers exploited a week-long smart contract upgrade gap in OpenSea to steal a large number of high-value NFTs and launder 1100 ETH through the Ethereum privacy trading platform Tornado.cash, making it impossible for OpenSea to track the hacker's transaction address.
Based on the floor price of mainstream assets, the hacker profited at least $4.166 million. From this incident, we can see that while privacy technology can protect personal privacy, it also provides a shield for wrongdoers. When security incidents occur, users find it difficult to assert their rights through the security guarantees of the Web 2.0 phase.
(2) How to meet regulatory compliance requirements?
When such hacking incidents occur, how can we protect users' legal rights and asset security? KYC (Know Your Customer) is essential. How should KYC be conducted? Who is responsible for KYC? In what manner should KYC be performed? Under distributed solutions, new ideas are needed for these issues. However, it is crucial to prevent personal data leaks due to KYC, as this would nullify all efforts made by other privacy protection technologies.

