Chutes: Reconstructing the decentralized serverless infrastructure of Web3 and AI reasoning

2026-05-18 15:21:51

Collection

1. Core Summary

Chutes (SN64) is a decentralized serverless AI computing platform built on the Bittensor network. In the AI computing track of Web3, its core positioning is similar to a "ride-hailing platform" and model PaaS (Platform as a Service). The platform integrates globally distributed idle GPU computing power and combines advanced container orchestration technology to provide developers with ready-to-use, pay-as-you-go AI inference APIs.

At the underlying architecture level, Chutes adopts a classic dual-role game mechanism: miners provide the underlying hardware to respond to external requests at any time, while validators assess quality in real-time and allocate weights, thereby forming an industrial-grade inference network with both low cost and high concurrency capabilities. Currently, Chutes has pioneered a real commercial closed loop in the decentralized computing field, processing over 91 trillion tokens, with more than 400,000 active users, and has become the first self-reported subnet in the Bittensor ecosystem to break the $100 million valuation mark. By feeding real business revenue back into token value, Chutes has the potential to develop into a unicorn-level infrastructure in the decentralized AI track in the long term.

2. Industry Background: The Rise of AI Inference and the Dilemma of the Web2 Model

2.1 What is Model Inference? The Essential Difference from Pre-training

Before delving into the computing power platform, we need to clarify two core stages in the AI model lifecycle: pre-training (Training) and inference (Inference).

Model Pre-training: This is the "learning phase" of the AI model. Researchers need to input massive amounts of data (such as text corpora from the entire internet) into neural networks, continuously adjusting billions or even trillions of parameters within the model through large-scale matrix multiplications. This process is extremely time-consuming and requires high interconnect bandwidth for cluster computing power (such as NVLink), belonging to the category of "concentrating resources to accomplish major tasks" with heavy asset investment.

Model Inference: This is the "application phase" of the AI model. Once the model training is complete, the parameters are fixed. At this point, users input a prompt, and the model generates the next word with the highest probability through forward propagation calculation. Compared to training, the computing power required for inference is smaller, but it demands extremely high concurrency capability, very low latency response, and 24/7 system stability.

2.2 The Development Logic of the Computing Power Track and the Shift of Industry Focus

Looking back at the development of the computing power track, we can clearly see an evolutionary main line: from the early CPU general computing to the rise of GPU parallel computing (the establishment of the CUDA ecosystem), and now to the flourishing of TPU and ASIC chips specifically tailored for AI. In recent years, capital and technology have almost entirely focused on "how to train smarter models." However, with the leaps in capabilities of open-source large models like the Llama series and DeepSeek, the intelligence gap between open-source models and closed-source giants (such as GPT-4) has been rapidly narrowed. The value capture focus of the AI industry is irreversibly shifting from "model pre-training" to "model inference." The reason is that for large models to achieve true large-scale commercialization and empower various industries, they must have 24/7 high availability and low latency response capabilities. At this point, "how to run models cheaply, stably, and quickly" has become the industry's biggest pain point.

2.3 AI Inference Participants in the Web2 Era and Core Limitations

The current Web2 inference track is mainly dominated by the following types of participants:

Closed-source model API providers: Such as OpenAI (ChatGPT), Anthropic (Claude), Google (Gemini). They provide extremely user-friendly APIs but operate as black boxes, are expensive, and have strong ecosystem lock-in.

Traditional cloud service giants: Such as AWS (Amazon Web Services), Microsoft Azure, Google Cloud. They offer underlying virtual machine or GPU bare metal leasing, which is highly flexible but has extremely high operational costs.

Vertical inference-as-a-service (MaaS) platforms: Such as Together AI, Anyscale, HuggingFace Inference Endpoints. They specialize in providing inference hosting services for open-source models.

However, when developers use the services of these Web2 giants (such as OpenAI API, AWS, or Together AI), they still face three insurmountable challenges: high "computing power tax" and coarse settlement granularity: The depreciation of hardware and software in centralized data centers (site rent, cooling systems, high server procurement costs) and maintenance costs are extremely high, leading to API call fees that developers ultimately bear being quite high. Additionally, traditional cloud computing often charges by the "hour" or "entire machine," which is not friendly to large-scale applications with massive instantaneous concurrency demands, often resulting in serious resource idle waste during non-peak periods.

Complex "infrastructure pitfalls": For startup teams attempting to bypass cloud vendor APIs and self-lease machines to deploy open-source large models, they must face an extremely steep learning curve. They need to solve complex GPU selection, underlying driver configuration, inference acceleration framework (such as vLLM, TensorRT) tuning, node maintenance, and containerized cluster orchestration issues, making the engineering threshold very high.

Vendor lock-in and data privacy risks: Once enterprises deeply bind to specific cloud vendor API services, their future technology roadmap expansion and cost structure will be completely constrained. More critically, for highly sensitive industries such as healthcare, finance, and law, transmitting core business's private data to centralized API servers for processing poses a high risk of data leakage and compliance issues.

3. Breaking the Deadlock: Chutes Reconstructs AI Inference with "Network"

3.1 Core Positioning: The "Ride-Hailing Platform" and Decentralized PaaS in the AI Ecosystem

In the vast and clearly divided Bittensor ecosystem, each subnet has its own role. For example, Templar (SN3) plays the role of a "car factory," with its core task being to aggregate computing power to train top-tier open-source models from scratch; while Chutes (SN64) has a completely different positioning, focusing on "operational services," acting as a "ride-hailing platform" in the Web3 era.

Chutes itself does not produce models but efficiently integrates globally distributed "vehicles" (i.e., idle GPU computing power scattered around the world) through its network protocol, allowing ready-made, top-tier open-source models to operate efficiently on these nodes, thereby providing seamless inference services to external developers. Essentially, Chutes has built a decentralized, open-source-friendly, and highly standardized underlying PaaS (Platform as a Service) infrastructure on top of the blockchain.

3.2 True Serverless Experience and Extreme Cost Advantages

The core transformation brought to developers by Chutes is the realization of a true serverless experience. When using Chutes, developers do not need to worry about underlying hardware selection, environment configuration, and cluster maintenance; they only need to modify a few lines of code to smoothly access the network through an API fully compatible with OpenAI format. In terms of cost control, relying on the blockchain's native encrypted micropayments mechanism, Chutes has achieved a rare "per token billing" super fine-grained settlement in the industry. This disruptive billing method completely eliminates the resource idle waste caused by traditional cloud hosts charging by the hour. In practical applications, this advantage makes its price about 85% cheaper than traditional cloud services (like AWS) and saves at least 40% compared to most centralized API platforms on the market.

3.3 Privacy Upgrade: TEE Architecture Builds a Trust Flywheel

In decentralized networks, how to protect user inputs to anonymous nodes' prompts and business data has always been the biggest challenge. In response to enterprise users' deep concerns about privacy, Chutes is currently fully promoting the deployment of TEE (Trusted Execution Environments) in its network. TEE technology utilizes hardware-level encryption methods to isolate a strictly protected memory area within the CPU/GPU. This means that decentralized nodes can process inference requests within an encrypted "black box," and throughout the entire computing process, even the miners providing the computing power cannot peek into the user's sensitive input data. The introduction of this underlying technology fundamentally addresses the compliance and privacy pain points for enterprise-level commercial deployment in decentralized networks, clearing obstacles for large-scale adoption by Web2 enterprises.

4. Core Architecture: How AI Inference is Completed in the Network

In the underlying distributed architecture of Chutes, the system distributes massive inference tasks across the global network through complex routing and load balancing mechanisms. Its core participants are clearly divided into two categories, ensuring the final service quality through sophisticated cryptography and economic market games:

Miners (Service Providers): Computing power nodes from around the world must stake to access the system and load the system-specified "permanently hot models." "Hot models" mean that the massive parameters of the model have been pre-loaded into the GPU's VRAM. Based on advanced container orchestration technology, these computing power nodes must always maintain high system availability to handle sudden surges of high-concurrency API requests with extremely low cold start latency.

Validators (Quality Inspectors): In a decentralized network, there is no central authority for supervision, so it must rely on validator nodes. Validators are responsible for continuously sending randomly generated test requests to miners and routing real business requests, rigorously scoring miners' services across multiple core dimensions such as response latency (time to first token), throughput (number of tokens generated per second), and output accuracy. Well-performing miners will receive generous network token rewards, while poorly performing or malicious miners will be ruthlessly eliminated by the system, even losing their staked funds.

This decentralized game architecture based on Bittensor's underlying consensus cleverly transforms profit-driven incentives into service quality guarantees, ensuring that even a loosely distributed network can continuously deliver industrial-grade system stability comparable to centralized top-tier data centers.

5. Economic Engine: Transitioning from "Inflation-Driven" to "Real Blood Generation"

In past cycles of the crypto world, many early Web3 computing power projects fell into a death spiral: they overly relied on the malicious inflation release of tokens to subsidize and attract computing power (the so-called "mining"), and once the secondary market performed poorly, computing power would quickly dissipate. In contrast, Chutes' core competitive advantage lies in its successful operation of a healthy decentralized commercial closed loop.

Currently, the Chutes network can stably process a massive number of real B-end (enterprise-level) and C-end (end-user) API requests daily. Through the token system, the network charges these users real service fees. More critically, relying on the system's built-in automatic staking and settlement mechanism, these business revenues from the external real world (which may start from fiat payments) will ultimately be directly transformed into strong buying pressure for the network's ecological assets (tokens). This mechanism continuously feeds back to token holders and all parties involved in maintaining the network, truly achieving a transition from a "burning money to buy computing power" Ponzi model to a sustainable economic model of "real business blood generation."

6. Ecological Status and Impressive Data Performance

As of recent on-chain and business data tracking, the Chutes network has demonstrated extremely strong throughput limits and deep market penetration in actual high-concurrency business scenarios.

Core business volume breaks records: The Chutes network has cumulatively processed over 91 trillion tokens, a significant number in Web3 and many mid-sized Web2 platforms. Its daily peak processing volume can reach up to 50 billion times, serving over 400,000 end-users and developers.

Absolutely leading market position: With solid business data, Chutes has become the first self-reported subnet in the entire Bittensor ecosystem to cross the $100 million valuation mark.

Deep ecological integration and "water, electricity, and coal" attributes: Externally, Chutes has successfully served numerous breakout applications. Internally, Chutes has gradually become the core computing power provider for other subnets within the Bittensor ecosystem (focusing on various vertical applications and data processing), playing a key role as the underlying "water, electricity, and coal" of the entire decentralized AI ecosystem.

Robust token economic indicators: As of May 12, 2026, the subnet token Alpha (alpha token) price is approximately 0.0877 TAO. The network has attracted about 13,666 token holding addresses, with 244 active miner nodes and 12 validator nodes. Its network emission ratio is 8.77%. Meanwhile, in its DEX liquidity pool, the underlying TAO accounts for 7.88%, and Alpha accounts for 92.12%. In terms of both computing power scale and capital volume, Chutes is an absolute leading project in the TAO ecosystem. These data clearly reflect its actual market popularity:

(Data source: https://bittensormarketcap.com/subnets/64)

7. Competitive Landscape, Potential Challenges, and Future Outlook

7.1 Core Advantages and Moat Barriers in the Track

The current decentralized computing (DePIN + AI) track has completely bid farewell to the "conceptual talk and white paper writing" wilderness era and has entered the deep water zone of "delivery, cost, and stability competition." Compared to platforms that only provide bare metal leasing, Chutes' strongest moat lies in its commercially validated inference delivery capability and its absolute cost advantage over traditional Web2 giants. Combined with the future comprehensive rollout of the TEE privacy encryption architecture, Chutes successfully provides developers who fear the ecological monopoly and data hegemony of Silicon Valley giants with an ideal infrastructure that is completely permissionless and highly cost-effective.

7.2 Potential Challenges and Breaking Through Difficulties

Although the current business data and model circulation are impressive, for Chutes to transition from Web3 to a broader mainstream world, it still needs to overcome some hardcore challenges. Extreme concurrency redundancy and elasticity tests: When a truly "killer" AI application with tens of millions of daily active users suddenly connects to the network in a very short time, whether the decentralized network can maintain millisecond-level low latency responses without downtime under the surge in computing power demand will be the ultimate test of the scheduling algorithm. Breaking the traditional Web2 company's stereotype in the enterprise market: Despite having TEE technology support, how to break the stereotype of traditional Web2 companies and gain the trust of more compliant enterprises to adopt decentralized API protocols on a large scale still requires long-term and continuous market education and cultivation.

7.3 Future Projections

In summary, as the era of high-frequency, autonomous interactions between multimodal large models and AI agents fully arrives, the dialogue between machines will generate exponentially increasing inference demands. At this point, a low-cost, unrestricted, on-demand micropayment-supported decentralized inference layer will undoubtedly become an indispensable infrastructure for the next generation of the internet. What Chutes represents is not only the decentralization of the underlying computing resource allocation method but also a universal distribution of open intellectual resources for human society. If Chutes can successfully cross the high wall of traffic acceptance and the trust gap in traditional enterprise adoption, it is highly likely to grow into a super base and unicorn platform with long-term value capture capability in the decentralized AI track in the coming years.

Join ChainCatcher Official

Telegram Feed: @chaincatcher

X (Twitter): @ChainCatcher_