When American giants collectively "defect" from Chinese AI models
Original Title: The Largest Cryptocurrency Exchange in the U.S. Quietly Switched to Chinese AI Models, Saving Half the Costs
Original Author: AI Hands-On Notes
A Data That Makes Silicon Valley Uneasy
Recently, Brian Armstrong, the CEO of the largest cryptocurrency exchange in the U.S., Coinbase, made a statement that caused a stir in the tech community:
"We switched our AI models to China's GLM 5.2 and Kimi 2.7, cutting our AI spending by half."
Cut by half? Does that mean usage has also decreased?
On the contrary. The token usage at Coinbase has been increasing.
Using more while saving money is what truly makes OpenAI and Anthropic uneasy.
How Did They Do It? Three Cost-Saving Strategies
Coinbase didn't just switch to a cheaper model. They built a complete "cost-saving system":
First Strategy: Don't Bind to One Model, Let the System Choose
Coinbase set up an automatic routing system. Each time a request comes in, the system automatically selects the most suitable model based on task type, price, and cache status.
Not all tasks require the most expensive model. Use the cheaper one for simple translations and the better one for complex reasoning—just like you wouldn't drive a sports car to the grocery store.
Second Strategy: Increase Cache Hit Rate from 5% to 60%
This is the most aggressive move. By optimizing their caching strategy, Coinbase increased the cache hit rate from 5% to 60%.
In simple terms, 60% of requests can reuse previous computation results, significantly reducing the actual cost of each call. This optimization alone saved a substantial amount of money.
Third Strategy: Context Engineering
Coinbase requires developers to streamline context, starting new sessions for new tasks, and not cramming too much into one conversation.
This isn't laziness; it's a new discipline—known in the industry as Context Engineering. Anthropic explicitly pointed out in a technical blog that context engineering is more effective than prompt engineering when managing AI agents.
In simple terms: it's not about making AI smarter, but providing AI with more precise information.

▲ More and more companies are starting to be frugal with AI models
Not Just Coinbase, This is a Trend
Coinbase is not the first to take this leap.
Lindy, a 25-person AI startup, has completely switched from Claude to Deepseek. CEO Flo Crivello told CNBC: "AI costs have surpassed labor costs, which is unsustainable." After switching models, costs "plummeted," saving millions of dollars.
Snowflake CEO Sridhar Ramaswamy conducted a practical comparison: on 103 coding tasks, GLM-5.2 solved 66%, while Claude Opus 4.7 solved 67%. The difference? Almost negligible.
But the price difference is substantial:
Price Comparison (per million tokens)
- GLM-5.2: Input $1.40 / Output $4.40
- Claude Opus 4.7: Input $5 / Output $25
- GPT-5.5: Input $5 / Output $30
The output price difference is 5-7 times.
Cheap Goods Are Not Good? Don't Jump to Conclusions
At this point, you might ask: with such a low price, can the quality be the same?
To be honest, it's not exactly the same, but the gap is smaller than you think.
Snowflake's tests show that GLM-5.2 is indeed less stable on certain tasks—success rate on the first attempt is 47.6%, lower than Opus's 53.7%. Moreover, GLM sometimes "stubbornly" pursues the wrong direction: on one task, it took 24 minutes and made 411 calls, yet still failed. Opus completed it in 9 minutes with 49 calls.
However, on most tasks, the final success rates of both are nearly equal. The key is: are you willing to pay 5 times the price for a few percentage points of stability?
For many companies, the answer is becoming increasingly clear: no.

▲ The price gap between Eastern and Western AI models is reshaping the industry landscape
What Does This Mean for Ordinary People?
You might say: I'm not Coinbase, what does this have to do with me?
In fact, this trend has three direct implications for how you use AI:
1. Don't Stick to One Model
Many people only use one AI—either ChatGPT or Claude. But professional users don't do that. Using different models for different tasks is the most cost-effective approach.
Use the cheaper one for everyday Q&A, and the better one for coding and analysis. Just like you wouldn't go to a Michelin restaurant for every meal.
2. Caching and Reuse Are Key to Saving Money
If you often use AI for similar tasks (like writing weekly reports or organizing notes daily), learning to utilize caching and templates can significantly reduce consumption.
3. Streamlining Context = Better Results
Many people try to cram all background information into their conversations with AI. But it turns out that providing AI with less but more precise information yields better results. For new tasks, start new conversations. Don't make AI search for answers in a pile of historical records.
Deeper Changes: The AI Pricing Model is Being Reshaped
This wave of "model migration" is shaking the pricing logic of the entire AI industry.
The high valuations of OpenAI and Anthropic are based on the assumption of "continuously high revenue growth." But if more and more companies like Coinbase and Lindy turn to cheaper alternatives, this assumption will no longer hold.
Reports indicate that a price war has already begun between OpenAI and Anthropic. In the newly released GPT-5.6 series, the Terra model is half the price of GPT-5.5, while Luna focuses on the lowest price.
This is good news for users. The more intense the competition, the lower the prices and the more choices available.
When American giants start saving money with Chinese models, it indicates that AI competition is no longer just a benchmark race in laboratories, but a real cost battle. Being able to do the same thing for less money is the real skill.













