Geth Source Code Series: Storage Design and Implementation

2025-08-30 22:06:52

Collection

This series consists of six articles. The second article will systematically explain the storage structure design of Geth and the related source code, introducing its database hierarchy and analyzing the core functions of the corresponding modules at each level in detail.

Author: po, LXDAO

As the largest blockchain platform in the world, Ethereum's mainstream client Geth (Go-Ethereum) is responsible for the vast majority of node operation and state management. Geth's state storage system is fundamental to understanding Ethereum's operational mechanisms, optimizing node performance, and driving future client innovations.

1. Overview of Geth's Underlying Database

Starting from Geth v1.9.0, Geth divides its database into two parts: fast access storage (KV database for recent block and state data) and a storage called freezer (for older blocks and receipt data, referred to as "ancients").

The purpose of this division is to reduce reliance on expensive and fragile SSDs, migrating less frequently accessed data to cheaper and more durable disks. At the same time, this split can also alleviate the pressure on LevelDB/PebbleDB, improving its organization and read performance, allowing more state tree nodes to reside in memory given a certain cache size, thus enhancing overall system efficiency.

Fast Access Storage: Geth users may be familiar with the underlying database options, which can be configured via the --db.engine parameter. The current default option is pebbledb, but leveldb can also be chosen. Both are third-party key-value databases relied upon by Geth, responsible for storing files located at datadir/geth/chaindata (all block and state data) and datadir/geth/nodes (database metadata files, which are very small). The number of recent historical state blocks saved in fast access can be set via --history.state value, with a default of 90,000 blocks.
Freezer or Ancients Storage (Historical Data), its directory path is typically datadir/geth/chaindata/ancients. Since historical data is essentially static and does not require high-performance I/O, it can save valuable SSD space for more active data.

The focus of this article is on state data, which is stored in the KV database. Therefore, the underlying database referred to in the text defaults to this KV storage, rather than the freezer.

Geth Storage Structure: Five Logical Databases

Geth's underlying storage uses LevelDB/PebbleDB to store all data encoded with RLP, but logically divides it into five databases for different purposes:

| Name | Description | |----------------|------------------------| | State Trie | World state, including accounts and contract storage | | Contract Codes | Contract code | | State snapshot | World state snapshot | | Receipts | Transaction receipts | | Headers/Blocks | Block data |

Each type of data is distinguished by a key prefix (core/rawdb/schema.go), achieving logical separation of responsibilities. By using geth db inspect, one can view all Ethereum data stored by Geth (block height 22,347,000), where it can be seen that the largest disk space usage is from blocks, receipts, and state data.

+-----------------------+-----------------------------+------------+------------+
| DATABASE | CATEGORY | SIZE | ITEMS |
+-----------------------+-----------------------------+------------+------------+
| Key-Value store | Headers | 576.00 B | 1 |
| Key-Value store | Bodies | 44.00 B | 1 |
| Key-Value store | Receipt lists | 42.00 B | 1 |
| Key-Value store | Difficulties (deprecated) | 0.00 B | 0 |
| Key-Value store | Block number->hash | 42.00 B | 1 |
| Key-Value store | Block hash->number | 873.78 MiB | 22347001 |
| Key-Value store | Transaction index | 13.48 GiB | 391277094 |
| Key-Value store | Log index filter-map rows | 12.98 GiB | 132798523 |
| Key-Value store | Log index last-block-of-map | 2.73 MiB | 59529 |
| Key-Value store | Log index block-lv | 45.05 MiB | 2362175 |
| Key-Value store | Log bloombits (deprecated) | 0.00 B | 0 |
| Key-Value store | Contract codes | 9.81 GiB | 1587159 |
| Key-Value store | Hash trie nodes | 0.00 B | 0 |
| Key-Value store | Path trie state lookups | 19.62 KiB | 490 |
| Key-Value store | Path trie account nodes | 45.88 GiB | 397626541 |
| Key-Value store | Path trie storage nodes | 176.23 GiB | 1753966511 |
| Key-Value store | Verkle trie nodes | 0.00 B | 0 |
| Key-Value store | Verkle trie state lookups | 0.00 B | 0 |
| Key-Value store | Trie preimages | 0.00 B | 0 |
| Key-Value store | Account snapshot | 13.34 GiB | 290797237 |
| Key-Value store | Storage snapshot | 93.42 GiB | 1295163402 |
| Key-Value store | Beacon sync headers | 622.00 B | 1 |
| Key-Value store | Clique snapshots | 0.00 B | 0 |
| Key-Value store | Singleton metadata | 1.36 MiB | 20 |
| Ancient store (Chain) | Hashes | 809.85 MiB | 22347001 |
| Ancient store (Chain) | Bodies | 639.98 GiB | 22347001 |
| Ancient store (Chain) | Receipts | 244.19 GiB | 22347001 |
| Ancient store (Chain) | Headers | 10.69 GiB | 22347001 |
| Ancient store (State) | History.Meta | 37.58 KiB | 487 |
| Ancient store (State) | Account.Index | 5.80 MiB | 487 |
| Ancient store (State) | Storage.Index | 7.47 MiB | 487 |
| Ancient store (State) | Account.Data | 6.46 MiB | 487 |
| Ancient store (State) | Storage.Data | 2.70 MiB | 487 |
+-----------------------+-----------------------------+------------+------------+
| TOTAL | 1.23 TIB | |
+-----------------------+-----------------------------+------------+------------+

2. Storage Layering from the Source Code Perspective: 6 Types of DB

Overall, Geth includes six database modules: StateDB, state.Database, trie.Trie, TrieDB, rawdb, and ethdb, which can be seen as different levels of a "state life tree." The top-level StateDB serves as the state interface during the EVM execution phase, responsible for handling read and write requests for accounts and storage, passing these requests down layer by layer, ultimately handled by the lowest level ethdb, which is responsible for physical persistence.

Next, we will introduce the responsibilities of these six database modules and their collaborative relationships.

Geth Source Code Series: Storage Design and Implementation

2.1 StateDB

In Geth, StateDB is the only bridge between the EVM and the underlying state storage, responsible for abstracting and managing the read and write of contract accounts, balances, nonce, storage slots, and other information. All state-related read and write operations to other databases (TrieDB, EthDB) are triggered by the relevant interfaces in StateDB, making it the brain of all state databases. It does not directly operate on the underlying Trie or the underlying database (ethdb), but provides a simplified memory view, allowing the EVM to interact using a familiar account model. Therefore, most projects relying on Geth do not concern themselves with how the underlying EthDB or TrieDB is implemented—what matters is that they work correctly without needing modification. Most fork projects based on Geth will modify the StateDB structure to fit their business logic. For example, Arbitrum modified StateDB to manage their Stylus program; EVMOS modified StateDB to track calls to its stateful precompile contracts.

In the source code, the main definition of StateDB is located in core/state/statedb.go. Its core structure maintains a series of memory state objects (stateObject), each corresponding to an account (including contract storage). It also includes a journal (transaction log) to support rollbacks and a caching mechanism to track state changes. During transaction processing and block packaging, StateDB provides records of temporary state changes, which are only written to the underlying database after final confirmation.

The core read and write interfaces of StateDB are as follows, primarily related to the account model APIs:

// Read-related
func (s *StateDB) GetBalance(addr common.Address) *uint256.Int
func (s *StateDB) GetStorageRoot(addr common.Address) common.Hash
// Write dirty state data
func (s *StateDB) SetStorage(addr common.Address, storage map[common.Hash]common.Hash)
// Commit state changes (dirty data) that occurred during EVM execution to the backend database
func (s StateDB) commitAndFlush(block uint64, deleteEmptyObjects bool, noStorageWiping bool) (stateUpdate, error)

Lifecycle

The lifecycle of StateDB lasts only for one block. Once a block is processed and committed, this StateDB will be discarded and will no longer be effective.

When the EVM first reads a certain address, StateDB will load its value from the Trie→TrieDB→EthDB database and cache it in a new state object (stateObject.originalStorage). This stage is considered a "clean object."
When a transaction interacts with this account and changes its state, the object becomes "dirty." The stateObject will track both the original state of the account and all modified data, including its storage slots and its clean/dirty state.
If the entire transaction is ultimately successfully packaged into a block, StateDB.Finalise() will be called. This function is responsible for cleaning up contracts that have been selfdestructed, resetting the journal (transaction log), and the gas refund counter.
After all transactions have been executed, StateDB.Commit() is called. Before this, the state tree Trie has not actually been changed. It is only at this step that StateDB writes the in-memory state changes to the storage Trie, calculates the final storage root for each account, and generates the final state of the account. Subsequently, all "dirty" state objects will be written into the Trie, updating its structure and calculating the new stateRoot.
Finally, these updated nodes will be passed to TrieDB, which will cache these nodes based on different backends (PathDB/HashDB) and ultimately persist them to disk (LevelDB/PebbleDB)—provided that this data has not been discarded due to chain reorganization.

2.2 State.Database

state.Database is an important intermediate layer in Geth that connects StateDB with the underlying databases (EthDB and TrieDB). It provides a set of concise interfaces and utility methods for state access. Although its interface is relatively thin, it plays multiple key roles in the source code, especially in state tree access and optimization.

In the Geth source code (core/state/database.go), the state.Database interface is implemented by the specific data structure state.cachingDB. Its main functions include:

Providing a unified state access interface

state.Database is a necessary dependency for building StateDB, encapsulating the logic for opening account Tries and storage Tries, such as:

func (db *cachingDB) OpenTrie(root common.Hash) (Trie, error)
func (db *cachingDB) OpenStorageTrie(stateRoot common.Hash, address common.Address, root common.Hash, trie Trie) (Trie, error)

These methods hide the complexity of the underlying TrieDB, allowing developers to simply call these methods to obtain the correct Trie instance when constructing the state of a block, without needing to directly manipulate hash paths, trie encoding, or the underlying database.

Caching and Reusing Contract Code (code cache)

Accessing contract code is costly and often reused across multiple blocks. Therefore, state.Database implements code caching logic to avoid repeatedly loading contract bytecode from disk. This optimization is crucial for improving block execution efficiency:

func (db *CachingDB) ContractCodeWithPrefix(address common.Address, codeHash common.Hash) []byte

This interface allows for quick cache hits based on address and code hash, falling back to the underlying database only if there is a cache miss.

Long Lifecycle, Reused Across Multiple Blocks

Unlike StateDB, which has a lifecycle limited to a single block, state.Database has a lifecycle that aligns with the entire chain (core.Blockchain). It is constructed when the node starts and persists throughout its operation, serving as a "faithful partner" to StateDB, providing support during the processing of each block.

Preparing for Future Verkle Tree Migration

Although state.Database currently appears to be just "code caching + trie access encapsulation," its positioning in the Geth architecture is very forward-looking. Once the future state structure switches to Verkle Trie, it will become a core component of the migration process: handling the bridging state between the new and old structures.

2.3 Trie

In Geth, the state tree Trie (Merkle Patricia Trie) itself does not store data, but it undertakes the core responsibilities of calculating the state root hash and collecting modified nodes, serving as a bridge between StateDB and the underlying storage, making it the central structure of Ethereum's state system.

When the EVM executes transactions or calls contracts, it does not directly operate on the underlying database but interacts with Trie indirectly through StateDB. Trie receives queries and update requests for account addresses and storage slots, constructing the state change paths in memory. These paths ultimately generate a new root hash (state root) through recursive hashing, which uniquely identifies the current world state and is written into the block header, ensuring the integrity and verifiability of the state.

Once a block execution is completed and enters the commit phase (StateDB.Commit), Trie will "collapse" all modified nodes into a necessary subset and pass them to TrieDB, which will further persist them to the backend node database (such as HashDB or PathDB). Since Trie nodes are encoded in a structured manner, they support efficient reading and allow the state to be safely synchronized and verified across different nodes. Thus, Trie is not just a state container but also a link connecting the upper EVM with the lower storage engine, ensuring that Ethereum's state possesses consistency, security, and modular scalability.

In the source code, Trie is primarily located in trie/trie.go, providing the following core interfaces:

type Trie interface {
GetKey([]byte) []byte
GetAccount(address common.Address) (*types.StateAccount, error)
GetStorage(addr common.Address, key []byte) ([]byte, error)
UpdateAccount(address common.Address, account *types.StateAccount, codeLen int) error
UpdateStorage(addr common.Address, key, value []byte) error
DeleteAccount(address common.Address) error
DeleteStorage(addr common.Address, key []byte) error
UpdateContractCode(address common.Address, codeHash common.Hash, code []byte) error
Hash() common.Hash
Commit(collectLeaf bool) (common.Hash, *trienode.NodeSet)
Witness() map[string]struct{}
NodeIterator(startKey []byte) (trie.NodeIterator, error)
Prove(key []byte, proofDb ethdb.KeyValueWriter) error
IsVerkle() bool
}

Taking the node query trie.get as an example, it recursively searches for the corresponding node of an account or contract storage based on the node type, with a search time complexity of log(n), where n is the path depth.

func (t *Trie) get(origNode node, key []byte, pos int) (value []byte, newnode node, didResolve bool, err error) {
switch n := (origNode).(type) {
case nil:
return nil, nil, false, nil
case valueNode:
return n, n, false, nil
case *shortNode:
if !bytes.HasPrefix(key[pos:], n.Key) {
// key not found in trie
return nil, n, false, nil
}
value, newnode, didResolve, err = t.get(n.Val, key, pos+len(n.Key))
if err == nil && didResolve {
n.Val = newnode
}
return value, n, didResolve, err
case *fullNode:
value, newnode, didResolve, err = t.get(n.Children[key[pos]], key, pos+1)
if err == nil && didResolve {
n.Children[key[pos]] = newnode
}
return value, n, didResolve, err
case hashNode:
child, err := t.resolveAndTrack(n, key[:pos])
if err != nil {
return nil, n, true, err
}
value, newnode, _, err := t.get(child, key, pos)
return value, newnode, true, err
default:
panic(fmt.Sprintf("%T: invalid node: %v", origNode, origNode))
}
}

2.4 TrieDB

TrieDB is the intermediate layer between Trie and disk storage, focusing on the access and persistence of Trie nodes. Every Trie node (whether account information or contract storage slots) will ultimately be read and written through TrieDB.

A TrieDB instance is created when the program starts and is destroyed when the node shuts down. It requires an EthDB instance to be passed in during initialization, which is responsible for the actual data persistence operations.

Currently, Geth supports two implementations of TrieDB backends:

HashDB: The traditional method, using hashes as keys.
PathDB: The newly introduced path-based model (default configuration after Geth 1.14.0), using path information as keys, optimizing update and pruning performance.

In the source code, TrieDB is primarily located in triedb/database.go.

Reading Logic of Trie Nodes

Let's first look at the reading process of nodes, as it is relatively simple.

All TrieDB backends must implement a database.Reader interface, defined as follows:

type Reader interface {
Node(owner common.Hash, path []byte, hash common.Hash) ([]byte, error)
}

This interface provides basic node query functionality, locating and returning the node from the trie tree based on the path (path) and node hash (hash). Note that the returned value is a raw byte array—TrieDB does not care about the content of the node and does not know whether it is an account node, leaf node, or branch node (this is parsed by the upper Trie).

The owner parameter in the interface is used to distinguish different tries:

If it is an account trie, owner is left empty.
If it is a contract's storage trie, owner is the address of that contract, as each contract has its own independent storage trie.

In other words, TrieDB serves as the read/write bus for the underlying nodes, providing a unified interface for the upper Trie, without involving semantics, only caring about paths and hashes. It decouples Trie from the physical storage system, allowing different storage models to be flexibly replaced without affecting upper-level logic.

TrieDB of HashDB

The historical method of node persistence used by TrieDB is:

Using the hash (Keccak256) of each Trie node as the key and the RLP encoding of that node as the value, which is then written into the underlying key-value store. This method is now referred to as HashDB.

This design is very straightforward but has several significant advantages:

Supports coexistence of multiple Tries: Just knowing the root hash allows for traversing and recovering the entire Trie. The storage of each account, account Trie, and the root hashes of different historical states can be managed separately.
Subtree Deduplication: Since identical subtrees have the same structure and node hashes, they will naturally share in HashDB, eliminating the need for duplicate storage. This is particularly important for Ethereum's large state tree, as most states remain unchanged between blocks.

It should be noted that ordinary Geth nodes do not write the entire Trie to disk after each block; this complete persistence only occurs in "archive mode" (--gcmode archive), while most mainnet nodes do not use archive mode.

So how is the state written to disk in normal mode? In fact, state updates are first cached in memory and written to disk with a delay. This mechanism is called "delayed flush," with triggering conditions including:

⏱️ Timed flush: By default, it automatically writes once every 5 minutes (equivalent to processing about 5 minutes' worth of blocks).
💾 Cache capacity reaching its limit: When the state cache is full, it must flush to free up memory.
⛔ Node shutdown: For data integrity, all caches will be flushed.

Although the structure of HashDB is simple, its memory management is quite complex, especially regarding the garbage collection mechanism for invalid nodes: Suppose a contract is created in one block and destroyed in the next—at this point, all state nodes related to that contract (including the contract account and its independent storage Trie) are no longer useful. If not cleaned up, they will unnecessarily occupy memory. Therefore, HashDB has designed a reference counting and node usage tracking mechanism to determine which nodes are no longer in use and remove them from the cache.

TrieDB of PathDB

PathDB is a new backend implementation of TrieDB. It changes how Trie nodes are persisted on disk and maintained in memory. As mentioned earlier, HashDB indexes storage based on the node's hash. This method makes it very difficult to prune parts of the state that are no longer in use. To solve this long-standing issue, Geth introduced PathDB.

PathDB has several notable differences from HashDB:

Trie nodes in the database are stored using their paths as keys. The path for an account or storage key node is the hash of that account address or the common prefix of the storage key in the trie tree; for nodes in a contract's storage Trie, the path prefix includes the hash of that account address.

account trie node key = Prefix(1byte) || COMPACTED(nodepath) storage trie node key = Prefix(1byte) || account hash(32byte) || COMPACTed(nodepath）

HashDB periodically flushes the complete state of each block to disk. This means that even for old blocks that you do not care about, the complete state will remain. In contrast, PathDB always maintains a single Trie on disk. Each block only updates the same Trie. Because it uses paths as keys, modifying nodes only requires overwriting the old nodes; deleted nodes can also be safely removed since no other Trie will reference them.
The persisted Trie is not the latest head of the chain but is at least 128 blocks behind the head. The Trie changes for the most recent 128 blocks are kept in memory to handle short chain reorganizations (reorgs).
If a larger reorg occurs, PathDB will use the state diffs (state differences) of each block stored in the freezer to perform a rollback, reverting the disk state to the fork point.

2.5 RawDB

In Geth, rawdb is a low-level database read/write module that directly encapsulates the logic for accessing core data such as state, blockchain data, and Trie nodes. It serves as the foundational interface layer of the entire storage system. It is not directly exposed to the EVM or business logic layers but serves as an internal tool for the persistence operations of modules like TrieDB, StateDB, and BlockChain. Like trie, rawdb does not directly store data itself; both are abstraction layers over the underlying database, responsible for defining access rules rather than executing the final data writes or reads. You can think of rawdb as Geth's "hard drive," defining the key-value format and access interfaces for all core on-chain data, ensuring that different modules can read and write data uniformly and reliably. Although it is rarely used directly in development, it is the most fundamental and critical part of the entire Geth storage layer.

Core Functions

In the source code, rawdb is primarily located in core/rawdb/accessors_trie.go. rawdb provides a large number of ReadXxx and WriteXxx methods for standardized access to different types of data. For example:

Block data (core/rawdb/accessors_chain.go): ReadBlock, WriteBlock, ReadHeader, etc.
State data (core/rawdb/accessors_trie.go): WriteLegacyTrieNode, ReadTrieNode, etc.
Overall metadata: such as total difficulty, latest head block hash, genesis information, etc.

These methods typically organize data in the underlying database (LevelDB or PebbleDB) using agreed-upon key prefixes (e.g., h for header, b for block, a for AccountTrieNode).

Relationship with TrieDB

TrieDB itself does not directly operate on the hard disk; it delegates specific read and write operations to rawdb. rawdb then calls the lower-level ethdb.KeyValueStore interface, which could be LevelDB, PebbleDB, or an in-memory database. For example, when writing data related to Trie (accounts, storage slots, etc.):

Trie nodes based on HashDB use methods like rawdb.WriteLegacyTrieNode to write in the form of (hash, rlp-encoded node) to the database.
Trie nodes based on PathDB use methods like WriteAccountTrieNode, WriteStorageTrieNode to write in the form of (path, rlp-encoded node) to the database.

2.6 EthDB

In Geth, ethdb is the core abstraction of the entire storage system, playing the role of the "tree of life"—deeply rooted in the disk, providing support to various components of the EVM and execution layer. Its main purpose is to shield the differences in underlying database implementations, providing a unified key-value read/write interface for the entire Geth. For this reason, Geth does not directly use specific databases (such as LevelDB, PebbleDB, MemoryDB, etc.) anywhere but accesses data through the interfaces provided by ethdb.

Interface Abstraction and Responsibility Division

In the source code, ethdb is primarily located in ethdb/database.go. The core interface in ethdb is KeyValueStore(), which defines common key-value operation methods:

type KeyValueStore interface {
Has(key []byte) (bool, error)
Get(key []byte) ([]byte, error)
Put(key []byte, value []byte) error
Delete(key []byte) error
}

This set of interfaces is very concise, covering basic read and write operations. The extended interface ethdb.Database adds support for reading and writing to the freezer cold storage (AncientStore), mainly for managing chain data (such as historical blocks and transaction receipts): recent blocks are stored in KV storage, while older ones are migrated to the freezer.

Additionally, ethdb offers various specific implementation versions:

LevelDB: The earliest default implementation, stable and mature.
PebbleDB: The currently recommended default implementation, faster and more resource-efficient.
RemoteDB: Used for remote state access scenarios, particularly important in light nodes, validators, or modular execution environments.
MemoryDB: A fully in-memory implementation, commonly used in dev mode and unit testing.

This allows Geth to flexibly switch storage backends between different scenarios, such as using MemoryDB for development and debugging, and PebbleDB for mainnet deployment.

Lifecycle and Module Interconnection

Each Geth node creates a unique ethdb instance at startup, which persists throughout the program until the node shuts down. In terms of structural design, it is injected into core.Blockchain, which is then passed to modules like StateDB, TrieDB, etc., becoming a globally shared data access entry point.

Because ethdb abstracts the details of the underlying database, other components of Geth can focus on their respective business logic, such as:

StateDB only cares about accounts and storage slots;
TrieDB only cares about how to store and retrieve Trie nodes;
rawdb only cares about how to organize the key-value layout of chain data;

These upper-level components do not need to be aware of which specific database engine the data resides in.

3. Creation Order and Call Chain of the Six DBs

This section outlines the startup process and call relationships of these six DBs, starting from the initialization of the Geth node.

3.1 Creation Order:

The overall creation order is ethdb → rawdb/TrieDB → state.Database → stateDB → trie, with the specific call chain in the source code as follows:

【Node Initialization Phase】
MakeChain
└── MakeChainDatabase
└── node.OpenDatabaseWithFreezer
└── node.openDatabase
└── node.openKeyValueDatabase
└── newPebbleDBDatabase / remotedb
↓
ethdb.Database
↓
rawdb.Database (encapsulating ethdb)
└── rawdb.NewDatabaseWithFreezer(ethdb)
↓
trie.Database (TrieDB)
└── trie.NewDatabase(ethdb)
└── backend: pathdb.New(ethdb) / hashdb.New(ethdb)
↓
state.Database (cachingDB)
└── state.NewDatabase(trieDB)
↓
【Block Processing Phase】
chain.InsertChain
└── bc.insertChain
└── state.New(root, state.Database)
↓
state.StateDB
└── stateDB.OpenTrie()
└── stateDB.OpenStorageTrie()
↓
trie.Trie / SecureTrie

3.2 Lifecycle Overview

| DB Module | Creation Timing | Lifecycle | Main Responsibilities | |------------------------|----------------------------------|----------------|----------------------------------------------------------| | ethdb.Database | Node initialization | Throughout program | Abstract underlying storage, unified interface (LevelDB / PebbleDB / Memory) | | rawdb | Wrap ethdb calls | Does not store data itself | Provides read/write interfaces for chain data such as blocks/receipts/total difficulty | | TrieDB | core.NewBlockChain() | Throughout program | Caching + persisting PathDB/HashDB nodes | | state.Database | core.NewBlockChain() | Throughout program | Encapsulates TrieDB, contract code caching, future support for Verkle migration | | state.StateDB | Created once before each block execution | During block execution | Manages state read/write, calculates state root, records state changes | | trie.Trie | Created each time an account or slot is accessed | Temporary, does not store data itself | Responsible for Trie structure modification and root hash calculation |

4. Detailed Comparison of State Submission and Reading Mechanisms between HashDB and PathDB

After block execution, StateDB will call func (s *StateDB) Commit(block uint64, deleteEmptyObjects bool, noStorageWiping bool), triggering the following storage state updates:

Collect all updates related to the Trie state tree through ret, err := s.commit(deleteEmptyObjects, noStorageWiping)

func (s StateDB) commit(deleteEmptyObjects bool, noStorageWiping bool) (stateUpdate, error) {
…
newroot, set := s.trie.Commit(true)
root = newroot
…
}

The trie.Commit method called will collapse all nodes (whether short nodes or full nodes) into hash nodest.root = newCommitter(nodes, t.tracer, collectLeaf).Commit(t.root, t.uncommitted > 100), collecting all dirty nodes to return to StateDB.
StateDB uses all collected dirty nodes to update the TrieDB cache layer:
HashDB maintains an in-memory dirties map[common.Hash]*cachedNode object to cache these updates and update the corresponding trie node references, with a size limit.
PathDB maintains a tree *layerTree object in memory and adds a layer of diffs to cache these updates, allowing for a maximum of 128 layers of diffs.

func (s StateDB) commitAndFlush(block uint64, deleteEmptyObjects bool, noStorageWiping bool) (stateUpdate, error) {
…
// If trie database is enabled, commit the state update as a new layer
if db := s.db.TrieDB(); db != nil {
start := time.Now()
if err := db.Update(ret.root, ret.originRoot, block, ret.nodes, ret.stateSet()); err != nil {
return nil, err
}
s.TrieDBCommits += time.Since(start)
}
…

When the cache of HashDB or PathDB exceeds its limit, it triggers a flush, using the relevant interfaces provided by rawdb to write the cache to the actual persistent layer of ethdb:
In the full node HashDB mode, since the key is a hash, if the same account is modified, the underlying database cannot easily delete that key and its corresponding value, as it cannot perceive whether it is the same account, which may affect the state of other accounts. Therefore, it will only write the newly modified KV into the DB and cannot delete the old state, making it difficult to prune the full node state. For example, two different contract addresses A and B actually store the same contract code, and they share the same storage in HashDB (key is hash, value is contract code). If contract A is destroyed after EVM execution, the key for contract B's code in the database is the same, so the value cannot be deleted arbitrarily, or else contract B will not be able to read its contract code later.
In the full node PathDB mode, since the key is a path, the same account corresponds to the same key in the underlying DB, allowing the state corresponding to the same account to be overwritten, making it easier to prune the full node state. Therefore, Geth's full nodes now default to using PathDB mode.
Since archive nodes need to store the state corresponding to every block, HashDB is more advantageous in this case, as the data of many accounts under different blocks has not actually changed, and using hashes as keys automatically provides a pruning characteristic. In contrast, PathDB needs to save the state of all accounts under each block, leading to a super large state, so Geth's archive nodes only support HashDB mode.

Example: Actual Disk Comparison between HashDB and PathDB in Full Node Mode

Assuming the left side of the Trie is the initial state of the MPT, where the red nodes are to be modified; the right side is the new state of the MPT, with green indicating that the previous four red nodes have been modified.

Geth Source Code Series: Storage Design and Implementation

In HashDB mode, since the C/D/E nodes will change after the modification, their hashes will inevitably change. Therefore, even though the C/D/E nodes corresponding to the three accounts have already been persisted, the new nodes C'/D'/E' still need to be persisted, and once persisted, it becomes very difficult to delete these old nodes. The states before (left image) and after (right image) the disk update are as follows, where the collapsed Node can be simply understood as the value stored by the node.

Geth Source Code Series: Storage Design and Implementation

In PathDB mode, although the values corresponding to the C/D/E nodes have changed, since the underlying storage key (path) remains unchanged, the persistence can directly replace the values corresponding to these three nodes with C'/D'/E', and the disk data will not have excessive redundancy (although some identical contracts may be saved under different paths, the impact is minimal). The states before (left image) and after (right image) the disk update are as follows.

Geth Source Code Series: Storage Design and Implementation

Example: Comparison of Reading Accounts between HashDB and PathDB

In core/rawdb/accessors_trie.go, add the following debug code to test StateDB reading 0xB3329fcd12C175A236a02eC352044CE44d (account hash: 0x**aea7c67d**a6a9bdb230dd07d0e96626e5e57c9cba04dc8039c923baefe55eacd1) involving Trie node database reads:

func ReadAccountTrieNode(db ethdb.KeyValueReader, path []byte) []byte {
fmt.Println("PathDB read:", hexutil.Encode(accountTrieNodeKey(path)))
data, _ := db.Get(accountTrieNodeKey(path))
return data
}
func ReadLegacyTrieNode(db ethdb.KeyValueReader, hash common.Hash) []byte {
fmt.Println("HashDB read:", hash)
data, err := db.Get(hash.Bytes())
if err != nil {
return nil
}
return data
}

The Trie nodes read by PathDB are as follows, showing that the nodes corresponding to the first 8 bits of the account address hash are read:

0x41 is the prefix, and the extra 0 is for alignment of nibbles (half bytes)

PathDB read: 0x410a
PathDB read: 0x410a0e
PathDB read: 0x410a0e0a
PathDB read: 0x410a0e0a07
PathDB read: 0x410a0e0a070c
PathDB read: 0x410a0e0a070c06
PathDB read: 0x410a0e0a070c0607

The Trie nodes read by HashDB are as follows, showing that the nodes corresponding to the hash are read:

HashDB read: 0xb01e32b0c38555bb27f1a924b8408824f97dd8d70f096b218d397906a9095385
HashDB read: 0x99d38ce254e6c35a49504345a30e94b4ea08338279385bae33feaaa11c3a0a00
HashDB read: 0xfcc42d902aa9107b83ee7839a8bc61b370cc5eac9ee60db1af7165daf6c3f76b
HashDB read: 0x3232bc99a88337d2aea2e8c237eb5b4ebb9366ff5bdd94b965ac6f918bd6303f
HashDB read: 0x04ae6f0462f6c0c7e5827dc46fcd69329483d829c39f624744f7b55c09c2cc96
HashDB read: 0x22a16c466cc420e8ed97fd484cecc8f73160ee74a56cfc87ff941d1b56ff46f8
HashDB read: 0xae26238e219065458f314e456265cd9c935e829ba82aebe6d38bacdbb14582f3
HashDB read: 0xe9ce7770c224e563b0c407618b7b7d8614da3d5da89f3960a3bec97e78fc0ae0
HashDB read: 0x2c7d134997a5c3e0bf47ff347479ee9318826f1c58689b3d9caeac77287c3af8

Overall, both PathDB and HashDB maintain the Trie data structure to store state data, with PathDB using the path of the Trie node as the key, while HashDB uses the hash corresponding to the Trie node value as the key. Both store the same value, which is the value of the Trie node.

5. Tracking the Read/Write Operation Process Related to DB

1. Transaction Execution Phase

All account and storage values are read into StateDB memory through methods like StateDB.GetState, going through Trie→TrieDB(pathdb/hashdb)→RawDB→Level/PebbleDB.
Subsequently, EVM executes state changes (such as calling StateDB.SetBalance()) which are also retained in the memory of StateDB.
Including: balance changes, nonce updates, storage modifications.

2. Updating Cache After Executing a Single Block

Call StateDB.Commit() → Collect dirty nodes and convert them into modified Trie node groups, calculating the new StateRoot.
Internally call Trie.Commit() → Call TrieDB.Update() to store changes in the TrieDB cache layer.
PathDB has a maximum cache layer limit of 128 blocks of diffs.
HashDB's cache layer has a size limit.
Exceeding the above limits triggers TrieDB.Commit to persist to the underlying database.

3. Submitting Header / Receipts After Executing a Single Block:

Besides state, block headers, bodies, transaction receipts, and other data are written into the ethdb layer through interfaces like RawDB.Write*(ethdb).

4. Triggering Actual Persistence of TrieDB.Commit → batch → DB After Executing Multiple Blocks When Cache Exceeds Limit

When the node is an archive node or exceeds the flush interval or exceeds the cache limit of TrieDB or before the node shuts down, it starts to trigger a commit and ultimately persists. Below is the core code for persistence in PathDB mode:

func (db *Database) commit(hash common.Hash, batch ethdb.Batch, uncacher *cleaner) error {
…
rawdb.WriteLegacyTrieNode(batch, hash, node.node) // Multiple modified trie nodes are added to the batch (not yet persisted)
if batch.ValueSize() >= ethdb.IdealBatchSize { // Trigger write when reaching IdealBatchSize
batch.Write() // Persist
batch.Replay(uncacher) // Notify uncacher to clean memory
batch.Reset() // Reset batch
}
…

6. Conclusion

The six database modules in Geth each bear different levels of responsibilities, forming a bottom-up data access chain. Through multiple layers of abstraction and caching, upper-level modules do not need to concern themselves with the specific implementations of the lower layers, thus achieving the pluggability of the underlying storage engine and high I/O performance.

The lowest layer, ethdb, abstracts physical storage, shielding specific database types, and supports various backends such as LevelDB, Pebble, RemoteDB, etc.; the next layer is rawdb, responsible for encoding, decoding, and encapsulating core on-chain data structures such as blocks, block headers, and transactions, simplifying the read and write operations of chain data. TrieDB manages the caching and persistence of state tree nodes, supporting both hashdb and pathdb backends to implement different state pruning strategies and storage methods.

Above that, trie.Trie serves as the execution container for state changes and the core of root hash calculation, undertaking the actual state construction and traversal operations; state.Database encapsulates unified access to account and contract storage Tries and provides contract code caching; while the top-level state.StateDB serves as the interface connecting with the EVM during block execution, providing read caching and write support for accounts and storage, allowing the EVM to operate without needing to perceive the complex structure of the underlying Trie.

These modules collaboratively build a flexible and efficient state management system through responsibility separation and interface isolation, enabling Geth to maintain good performance and maintainability amidst complex chain states and transaction executions.