In-depth analysis: How did Solana get clogged?
Author: Nishil Jain (Head of Ecosystem Partnerships at Biconomy)
Compiled by: Odaily Planet Daily Azuma
Why are transactions on Solana currently always failing?
Let's break it down step by step, starting from the most basic concepts.
From a user's perspective, there are essentially three potential outcomes when we make a transaction on Solana:
Transaction executed successfully, everything is normal;
Transaction execution failed, the user has paid gas fees, but the execution result returns an error. This happens when the transaction does not meet the conditions, such as when the token the user is trying to purchase is sold out, or the price fluctuates too quickly (beyond the preset slippage), etc.;
Transaction dropped: the transaction is nowhere to be found, meaning it failed to reach the "block leader node" (Note: the node that takes turns every 4 blocks). This is the situation that most users are currently encountering, and it is essentially a network layer issue, rather than a consensus layer or execution layer issue.
Execution Issues Are Not the Main Cause of Congestion
Now you might ask, what is the network layer? Why do transactions get dropped? Why are they said to be the main cause of the current congestion on Solana?
Let's set aside these most important questions for now and take a look at those transactions that failed execution (i.e., the second situation) and explain why failed execution transactions are not the main cause of congestion.
Based on on-chain data, it can be seen that only about 8% of all failed execution transactions are submitted by real users, while the rest are initiated by on-chain bots for arbitrage trading.
Arbitrageurs continuously initiate "garbage" transactions because the cost of frequently initiating transactions is negligible compared to the potential profits from successful arbitrage.
Specifically, arbitrageurs can continuously initiate transactions throughout the day, with a cost of about a few hundred dollars per day (due to Solana's low network fees), but if they succeed in just one transaction, they could earn profits in the hundreds of thousands of dollars.
It is important to note that these failed transactions do not mean that the Solana network is malfunctioning; the blockchain is still operating normally. These are merely bot transactions that failed due to unmet conditions. This is not the main reason for the poor experience on Solana at present.
In fact, since November last year, the transaction failure rate on Solana has remained around 50%.
The Real Main Cause: Transactions Dropped at the Network Layer
Now, let's discuss the real main cause of congestion on Solana in the past few days—"transaction drops."
As mentioned earlier, these are transactions that failed to reach the "block leader node," and the reason they failed to arrive is that they were dropped at the network layer.
The network layer is the communication layer of the internet, used to send data packets from one endpoint to another. Common network layer protocols include TCP, UDP, QUIC (developed by Google), etc. Solana previously upgraded its network layer protocol to QUIC, which helps establish connections between users and the "block leader node."
Because Solana uses a continuous block production mechanism and does not have a mempool to temporarily store unconfirmed transactions, this means that once a connection is lost, the transaction will never be included in a block again.
The advantage of the QUIC protocol is that the "block leader node" can gain a new capability: to segment certain users' connections based on specific criteria or limit their data transmission rates.
The significance of this capability is that when a peak demand period occurs, the "block leader node" can proactively disconnect certain connections, thereby preventing Solana from completely crashing due to increased network activity.
You might be wondering again, if the design of the QUIC protocol is so perfect, why is Solana still so congested now?
The real problem is that, although the "block leader node" can now choose to actively adjust certain connections, the logic for deciding which connections need to be adjusted is flawed.
To better understand this issue, we can imagine a situation where each "block leader node" has X connections available for communication, but when a peak demand period occurs, the number of connection requests it receives is 10 to 100 times its capacity… at this point, the node needs to choose to disconnect certain connections. However, the current situation is that there is no established standard for how to choose which connections to disconnect (for example, disconnecting all connections with fees below xxx), and whether all connections will be disconnected is random…
Ultimately, in the current situation, if you want your transaction to be confirmed, all you can do is send more transaction requests, but since many bots are also continuously sending a large number of connection requests to the network, it becomes increasingly difficult for ordinary users to establish connections and complete transactions.
How to Fix It? How Long Will It Take?
This is the problem that Solana is currently facing.
Currently, teams such as Jump (Firedancer client development team), Anza (Agave client development team), and Solana Labs are working on fixing the network layer. Fix patches will be gradually rolled out this week, and it is reported that there will be some significant updates released in the coming weeks.
Will this effectively solve the problem? Will Solana "to da moon" again… there is no absolute answer.
The reason there are still many uncertainties is mainly due to three reasons:
First, no one can guarantee whether the upcoming fix patches will be effective. We can only observe the actual situation once they are in operation.
Second, the Firedancer client developed by Jump seems to be able to solve the problem, but it will not be officially released until the end of this year.
Third, regarding the issue of "garbage" transactions, Solana's economic mechanism makes it difficult for the network to prevent malicious actors from continuously launching "garbage" transaction attacks on the chain.
Finally, I want to urge everyone to recognize one thing: I believe Solana is fighting to make the right trade-offs (Note: referring to setting reasonable connection segmentation standards), just as Ethereum once overcame many problems, Solana will eventually overcome these issues as well.