Is the data infrastructure ready for the era of encrypted super apps?

2025-08-25 23:10:33

Collection

The moat is shifting towards "executable signals" and "underlying data capabilities," with long-tail assets and trading data loops presenting a unique opportunity for crypto-native entrepreneurs.

Author: Story, IOSG Ventures

TL;DR

Data Challenge: The block time competition of high-performance public chains has entered the sub-second era. The increasing complexity on the data side due to high concurrency, high traffic fluctuations, and multi-chain heterogeneity on the C-end requires data infrastructure to shift towards real-time incremental processing + dynamic scaling. Traditional batch processing ETL has delays ranging from minutes to hours, making it difficult to meet real-time transaction needs. Emerging solutions like The Graph, Nansen, and Pangea introduce stream processing, compressing delays to real-time tracking levels.

Paradigm Shift in Data Competition: The last cycle focused on "understandable"; this cycle emphasizes "profitability." Under the Bonding Curve model, a one-minute delay can result in cost differences of several times. Tool iteration: from manually setting slippage → sniper bots → GMGN integrated terminal. The ability to trade on-chain is gradually commoditized, and the core competitive frontier shifts towards the data itself: those who can capture signals faster can help users profit.

Dimensional Expansion of Transaction Data: Meme is essentially the financialization of attention, with key elements being narrative, attention, and subsequent dissemination. The closed loop of off-chain public opinion × on-chain data: narrative tracking summaries and sentiment quantification become the core of trading. "Underwater data": fund flows, role portraits, smart money/KOL address tagging reveal the implicit games behind on-chain anonymous addresses. The next generation of trading terminals will integrate multi-dimensional signals from on-chain and off-chain data to the second level, enhancing entry and risk-avoidance judgments.

AI-Driven Executable Signals: From information to profit. The new phase of competition aims for speed, automation, and the ability to generate excess returns. LLM + multi-modal AI can automatically extract decision signals and combine them with Copy Trading, take-profit, and stop-loss execution. Risk challenges: hallucinations, short signal lifetimes, execution delays, and risk control. Balancing speed and accuracy, reinforcement learning and simulation backtesting are key.

Survival Choices for Data Dashboards: Lightweight data aggregation/dashboard applications lack a moat, and their survival space is shrinking. Downward: deepening high-performance underlying pipelines and integrated data research. Upward: extending to the application layer, directly addressing user scenarios, and increasing data call activity. Future competitive landscape: either become the infrastructure for Web3 utilities or become the user platform for Crypto Bloomberg.

The moat is shifting towards "executable signals" and "underlying data capabilities," with the closed loop of long-tail assets and transaction data presenting unique opportunities for crypto-native entrepreneurs. Opportunity window in the next 2-3 years:

Upstream infrastructure: Web2-level processing capabilities + Web3 native demand → Web3 Databricks/AWS.
Downstream execution platforms: AI Agent + multi-dimensional data + seamless execution → Crypto Bloomberg Terminal.

Thanks to projects like Hubble AI, Space & Time, OKX DEX for their support of this research report!

Introduction: The Triple Resonance of Meme, High-Performance Public Chains, and AI

In the last cycle, the growth of on-chain transactions primarily relied on infrastructure iteration. As we enter a new cycle, with the gradual maturation of infrastructure, super applications represented by Pump.fun are becoming the new growth engines in the crypto industry. This asset issuance model, with a unified issuance mechanism and sophisticated liquidity design, has shaped a trading trench characterized by fairness and frequent wealth myths. The replicability of this high-multiplicity wealth effect is profoundly changing users' profit expectations and trading habits. Users need not only faster entry opportunities but also the ability to acquire, analyze, and execute multi-dimensional data in a very short time, while existing data infrastructure struggles to support such density and real-time demands.

Consequently, there is a higher-level demand for trading environments: lower friction, faster confirmations, and deeper liquidity. Trading venues are rapidly migrating to high-performance public chains and Layer2 Rollups represented by Solana and Base. The transaction data volume of these public chains has increased by more than ten times compared to the previous round of Ethereum, presenting more severe data performance challenges for existing data providers. With the imminent launch of new-generation high-performance public chains like Monad and MegaETH, the demand for on-chain data processing and storage will grow exponentially.

At the same time, the rapid maturation of AI is accelerating the realization of intelligent equity. The intelligence of GPT-5 has reached a doctoral level, and multi-modal large models like Gemini can easily understand candlestick charts… With the help of AI tools, complex trading signals can now be understood and executed by ordinary users. In this trend, traders are beginning to rely on AI for trading decisions, which in turn depend on multi-dimensional, high-efficacy data. AI is evolving from an "auxiliary analysis tool" to a "central hub for trading decisions," and its proliferation further amplifies the demand for data real-time, interpretability, and scalable processing.

Under the triple resonance of the Meme trading frenzy, the expansion of high-performance public chains, and the commercialization of AI, the on-chain ecosystem's demand for a new data infrastructure is becoming increasingly urgent.

Addressing the Data Challenge of 100,000 TPS and Millisecond Block Times

With the rise of high-performance public chains and high-performance Rollups, the scale and speed of on-chain data have entered a new phase.

With the widespread adoption of high concurrency and low-latency architectures, daily transaction volumes easily exceed ten million, with raw data sizes measured in hundreds of GB. Taking Solana as an example, its average daily TPS has exceeded 1,200 over the past 30 days, with daily transaction numbers surpassing 100 million; on August 17, it even set a historical high of 107,664 TPS. According to statistics, Solana's ledger data is growing rapidly at a rate of 80-95 TB per year, which translates to 210-260 GB per day.

▲ Chainspect, 30-day average TPS

▲ Chainspect, 30-day transaction volume

Not only is throughput increasing, but the block times of emerging public chains have also entered the millisecond level. The Maxwell upgrade of BNB Chain has reduced block times to 0.8 seconds, while the Flashblocks technology of Base Chain compresses it to 200 milliseconds. In the second half of this year, Solana plans to replace PoH with Alpenglow, reducing block confirmation times to 150 milliseconds, while the MegaETH mainnet aims for real-time block times of 10 milliseconds. These breakthroughs in consensus and technology significantly enhance the real-time nature of transactions but pose unprecedented demands on block data synchronization and decoding capabilities.

However, most downstream data infrastructures still rely on batch processing ETL pipelines, inevitably resulting in data delays. For example, Dune reports that contract interaction event data on Solana typically has a delay of about 5 minutes, while protocol-level aggregated data may take up to 1 hour. This means that on-chain transactions that could be confirmed within 400 milliseconds are delayed hundreds of times before they become visible in analytical tools, which is nearly unacceptable for real-time trading applications.

▲ Dune, Blockchain Freshness

To address the challenges on the data supply side, some platforms have shifted towards streaming and real-time architectures. The Graph compresses data delays to near real-time using Substreams and Firehose. Nansen has achieved performance improvements of several times on Smart Alerts and real-time dashboards by introducing streaming processing technologies like ClickHouse. Pangea aggregates computing, storage, and bandwidth provided by community nodes to offer real-time streaming data to B-end users such as market makers and quantitative analysts with delays of less than 100 milliseconds.

▲ Chainspect

In addition to the massive data volume, on-chain transactions also exhibit significant uneven traffic distribution characteristics. Over the past year, the weekly transaction volume of Pumpfun has varied nearly 30 times from lowest to highest. In 2024, the Meme trading platform GMGN experienced six server "overloads" in four days, forcing it to migrate its underlying database from AWS Aurora to the open-source distributed SQL database TiDB. After the migration, the system's horizontal scaling capabilities and computational elasticity significantly improved, increasing business agility by about 30%, which alleviated pressure during peak trading periods.

▲ Dune, Pumpfun Weekly Volume

▲ Odaily, TiDB's Web3 service case

The multi-chain ecosystem further exacerbates this complexity. Differences in log formats, event structures, and transaction fields among different public chains mean that each new chain requires customized parsing logic, greatly testing the flexibility and scalability of data infrastructure. Some data providers have thus adopted a "customer-first" strategy: where there is active trading activity, they prioritize connecting to that chain's services, balancing flexibility and scalability.

If data processing remains at the fixed-interval batch processing ETL stage in the context of high-performance chains, it will face delays, decoding bottlenecks, and query lags, failing to meet the demands for real-time, refined, and dynamic interactive data consumption. Therefore, on-chain data infrastructure must further evolve towards streaming incremental processing and real-time computing architectures, while also incorporating load balancing mechanisms to cope with the concurrency pressures brought by periodic trading peaks in the crypto space. This is not only a natural extension of the technical path but also a key link to ensure the stability of real-time queries, and it will form a true watershed in the competition of the next generation of on-chain data platforms.

Speed is Wealth: The Paradigm Shift in On-Chain Data Competition

The core proposition of on-chain data has shifted from "visualization" to "executability." In the last cycle, Dune was the standard tool for on-chain analysis. It met the needs of researchers and investors for "understandable" data, allowing people to stitch on-chain narratives together using SQL charts.

GameFi and DeFi players relied on Dune to track fund inflows and outflows, calculate gold mining returns, and withdraw in a timely manner before market turning points.
NFT players analyzed transaction volume trends, whale holdings, and distribution characteristics through Dune to predict market heat.

However, in this cycle, Meme players are the most active consumer group. They have driven the phenomenal application Pump.fun to accumulate revenues of $700 million, nearly double the total revenue of the leading consumer application Opensea from the previous cycle.

In the Meme track, the market's time sensitivity has been amplified to the extreme. Speed is no longer a luxury; it is the core variable that determines profit and loss. In the primary market priced by Bonding Curve, speed equals cost. Token prices rise exponentially with buying demand, and even a one-minute delay can result in several times the entry cost. According to Multicoin research, the most profitable players in this game often have to pay a 10% slippage to enter the block three points ahead of their competitors. The wealth effect and "get-rich-quick myths" drive players to chase second-level candlestick charts, same-block transaction execution engines, and one-stop decision panels, competing in information collection and order speed.

▲ Binance

In the era of manual trading on Uniswap, users had to set slippage and gas themselves, and the front end could not see prices, making trading feel like a "lottery"; in the era of BananaGun sniper bots, automatic sniping and slippage technology allowed retail players to stand on the same starting line as scientists; then came the PepeBoost era, where bots pushed pool opening information in real-time while also synchronously pushing front-row holding data; finally, we have reached the GMGN era, which has created an integrated terminal that combines candlestick information, multi-dimensional data analysis, and trading execution, becoming the "Bloomberg Terminal" of meme trading.

As trading tools continue to iterate, execution barriers gradually dissolve, and the competitive frontier inevitably shifts towards the data itself: those who can capture signals faster and more accurately can establish trading advantages in the rapidly changing market and help users make profits.

Dimensions are Advantages: The Truth Beyond Candlestick Charts

The essence of Memecoins is the financialization of attention. Quality narratives can continuously break boundaries, aggregating attention, thereby pushing prices and market values higher. For Meme traders, while real-time data is important, to achieve significant results, it is more crucial to answer three questions: What is the narrative of this token? Who is paying attention? How will attention continue to amplify in the future? These aspects leave only shadows on the candlestick charts; the real driving forces rely on multi-dimensional data—off-chain public opinion, on-chain addresses and holding structures, and the precise mapping of the two.

On-Chain × Off-Chain: The Closed Loop from Attention to Transactions

Users attract attention off-chain and complete transactions on-chain, and the closed-loop data between the two is becoming the core advantage of Meme trading.

# Narrative Tracking and Propagation Chain Identification

On social platforms like Twitter, tools like XHunt can help Meme players analyze the KOL attention lists of projects to determine the associated individuals behind the projects and potential attention propagation chains. 6551 DEX aggregates Twitter, official websites, tweet comments, listing records, KOL attention, etc., to generate complete AI reports for traders that change in real-time with public sentiment, helping traders accurately capture narratives.

# Sentiment Indicator Quantification

Infofi tools like Kaito and Cookie.fun aggregate content and perform sentiment analysis on Crypto Twitter, providing quantifiable indicators of Mindshare, Sentiment, and Influence. For example, Cookie.fun overlays these two indicators directly onto price charts, turning off-chain sentiment into readable "technical indicators."

▲ Cookie.fun

# On-Chain and Off-Chain are Equally Important

OKX DEX displays Vibes analysis alongside market data, aggregating KOL shout-out timestamps, leading associated KOLs, Narrative Summaries, and comprehensive scores, shortening the time for off-chain information retrieval. The Narrative Summary has become the most well-received AI product feature among users.

Underwater Data Display: Turning "Visible Ledgers" into "Usable Alpha"

In traditional finance, order flow data is controlled by large brokers, and quantitative firms must pay hundreds of millions of dollars each year to access it to optimize trading strategies. In contrast, the trading ledgers of Crypto are completely open and transparent, effectively "open-sourcing" high-priced intelligence, forming an open-pit gold mine waiting to be mined.

The value of underwater data lies in extracting invisible intentions from visible transactions. This includes fund flows and role characterization—whether a market maker is accumulating or distributing clues, KOL sub-account addresses, concentrated or dispersed chips, bundled trades, and abnormal fund flows; it also includes address profiling linkage—tagging each address with labels like smart money, KOL/VC, developers, phishing, and wash trading, and binding them with off-chain identities to connect on-chain and off-chain data.

These signals are often difficult for ordinary users to perceive but can significantly influence short-term market trends. By real-time parsing address tags, holding characteristics, and bundled trades, trading assistance tools are revealing the gaming dynamics "beneath the surface," helping traders avoid risks and find alpha in second-level market conditions.

For example, GMGN integrates smart money, KOL/VC addresses, developer wallets, wash trading, phishing addresses, bundled trading, and other tag analyses on top of on-chain real-time trading and token contract data, mapping on-chain addresses to social media accounts, aligning fund flows, risk signals, and price behavior to the second level, helping users make faster entry and risk-avoidance judgments.

▲ GMGN

AI-Driven Executable Signals: From Information to Profit

"The next round of AI is not about selling tools but about selling profits." ------ Sequoia Capital

This judgment also holds true in the Crypto Trading field. Once the speed and dimensions of data meet the standards, the subsequent competitive goal lies in the data decision-making process, whether it can directly convert multi-dimensional complex data into executable trading signals. The evaluation criteria for data decision-making can be summarized in three points: speed, automation, and excess return.

Speed: As AI capabilities continue to advance, the advantages of natural language and multi-modal LLMs will gradually come into play here. They can not only integrate and understand vast amounts of data but also establish semantic connections between data, automatically extracting decisive conclusions. In the high-intensity, low-trading-depth environment of on-chain trading, each signal has a very short validity period and capital capacity, and speed directly affects the returns that signals can bring.

Automation: Humans cannot monitor and trade 24 hours a day, but AI can. For example, users can place Copy Trading buy orders with take-profit and stop-loss conditions on the Senpi platform. This requires AI to perform real-time polling or monitoring of data in the background and automatically decide to place orders when it detects a recommended signal.

Return: Ultimately, the effectiveness of any trading signal depends on its ability to continuously generate excess returns. AI not only needs to have sufficient understanding of on-chain signals but also must incorporate risk control to maximize risk-return ratios in highly volatile environments. For instance, it should consider unique on-chain factors affecting return rates, such as slippage losses and execution delays.

This capability is reshaping the business logic of data platforms: from selling "data access rights" to selling "profit-driven signals." The competitive focus of the next generation of tools is no longer on data coverage but on the executability of signals—whether it can truly complete the last mile from "insight" to "execution."

Some emerging projects have begun to explore this direction. For example, Truenorth, as an AI-driven discovery engine, incorporates "decision execution rates" into the evaluation of information effectiveness, continuously optimizing output through reinforcement learning to minimize ineffective noise, helping users build directly executable information flows.

▲ Truenorth

Although AI has immense potential in generating executable signals, it also faces multiple challenges.

Hallucinations: On-chain data is highly heterogeneous and noisy, and LLMs can easily experience "hallucinations" or overfitting when parsing natural language queries or multi-modal signals, affecting signal returns and accuracy. For example, for multiple tokens with the same name, AI often fails to find the corresponding contract address for the CT Ticker. Similarly, for many AI signal products, discussions about AI in CT are often directed towards Sleepless AI.

Signal Lifetime: The trading environment is rapidly changing. Any delay will erode returns, and AI must complete data extraction, reasoning, and execution in a very short time. Even the simplest Copy Trading strategies will see returns turn negative if they do not follow smart money.

Risk Control: In high-volatility scenarios, if AI continuously fails on-chain or experiences excessive slippage, it may not only fail to generate excess returns but could also deplete the entire principal within minutes.

Therefore, finding a balance between speed and accuracy, and using mechanisms like reinforcement learning, transfer learning, and simulation backtesting to reduce error rates, is a competitive point for AI in this field.

Upward or Downward? The Survival Choices of Data Dashboards

As AI can directly generate executable signals and even assist in placing orders, "lightweight middle-layer applications" that solely rely on data aggregation are facing a survival crisis. Whether stitching on-chain data into dashboard tools or layering execution logic on top of aggregation with trading bots, they fundamentally lack a sustainable moat. In the past, these tools could rely on convenience or user mindset (for example, users habitually checking token CTO status on Dexscreener); however, now that the same data is available in multiple places, execution engines are increasingly commoditized, and AI can directly generate decision signals and trigger execution on the same data, their competitiveness is rapidly diminishing.

In the future, efficient on-chain execution engines will continue to mature, further lowering trading barriers. In this trend, data providers must make choices: either go downward, deepening faster data acquisition and processing infrastructure; or go upward, extending to the application layer, directly controlling user scenarios and consumption traffic. The model of being stuck in the middle, only doing data aggregation and lightweight packaging, will continue to be squeezed for survival space.

Going downward means building a moat around infrastructure. Hubble AI, in developing trading products, realized that relying solely on TG Bots could not form a long-term advantage, so it shifted to upstream data processing, aiming to create "Crypto Databricks." After optimizing Solana's data processing speed, Hubble AI is transitioning from data processing to an integrated data research platform, positioning itself upstream in the value chain to provide underlying support for the narrative of "financial on-chain" in the U.S. and the data needs of on-chain AI Agent applications.

Going upward means extending to application scenarios and locking in end users. Space and Time initially focused on sub-second SQL indexing and oracle push, but recently began exploring C-end consumption scenarios, launching Dream.Space on Ethereum—a "vibe coding" product. Users can write smart contracts or generate data analysis dashboards in natural language. This transformation not only increases the frequency of its own data service calls but also forms direct stickiness with users through terminal experiences.

Thus, it is evident that roles stuck in the middle, relying solely on selling data interfaces, are losing their survival space. The future B2B2C data track will be dominated by two types of players: one type is infrastructure companies that control underlying pipelines and become the "utilities" of on-chain; the other type is platforms that are close to user decision-making scenarios and transform data into application experiences.

Conclusion

Under the triple resonance of the Meme frenzy, the explosion of high-performance public chains, and the commercialization of AI, the on-chain data track is undergoing a structural shift. The iteration of trading speed, data dimensions, and executable signals has made "visible charts" no longer the core competitive advantage; the true moat is shifting towards "executable signals that can help users profit" and "the underlying data capabilities that support all of this."

In the next 2-3 years, the most attractive entrepreneurial opportunities in the crypto data field will emerge at the intersection of Web2-level infrastructure maturity and Web3 on-chain native execution models. The data of major cryptocurrencies like BTC/ETH, due to their high standardization, closely resembles traditional financial futures products and has gradually been incorporated into the data coverage scope of traditional financial institutions and some Web2 fintech platforms.

In contrast, the data of Meme coins and long-tail on-chain assets exhibits extremely high non-standardization and fragmentation characteristics—from community narratives, on-chain public opinion to cross-chain liquidity, this information needs to be interpreted in conjunction with on-chain address profiling, off-chain social signals, and even second-level trading execution. It is precisely within this difference that the processing and trading closed loop of long-tail assets and Meme data constitutes a unique opportunity window for crypto-native entrepreneurs.

We are optimistic about projects that deeply cultivate in the following two directions:

Upstream Infrastructure ------ On-chain data companies with streaming data pipelines, ultra-low latency indexing, and cross-chain unified parsing frameworks that match the processing capabilities of Web2 giants. Such projects are expected to become the Web3 version of Databricks/AWS, and as users gradually migrate on-chain, transaction volumes are expected to grow exponentially, with B2B2C models possessing long-term compounding value.

Downstream Execution Platforms ------ Applications that integrate multi-dimensional data, AI Agents, and seamless trading execution. By transforming fragmented signals from on-chain/off-chain into directly executable trades, these products have the potential to become the crypto-native Bloomberg Terminal, with their business models no longer relying on data access fees but monetizing through excess returns and signal delivery.

We believe that these two types of players will dominate the next generation of the crypto data track and build sustainable competitive advantages.

Click to learn about job openings at ChainCatcher