Detailed Explanation of Kakarot zkEVM: The EVM Compatibility Path of Starknet
Written by: Cynic
Source: Ethereum Enthusiasts
TL;DR
- A virtual machine is a software-emulated computer system that provides an execution environment for programs. It can simulate various hardware devices, allowing programs to run in a controlled and compatible environment. The Ethereum Virtual Machine (EVM) is a stack-based virtual machine used to execute Ethereum smart contracts.
- zkEVM is an EVM integrated with zero-knowledge proof/validity proof technology. It allows the execution process of the EVM to be verified using zero-knowledge proofs without requiring all validators to re-execute the EVM. There are various zkEVM products on the market, each with its own approach and design.
- The need for zkEVM arises from the demand for a virtual machine that supports smart contract execution on Layer 2. Additionally, some projects choose to use zkEVM to leverage the extensive user ecosystem of the EVM and design an instruction set that is more friendly to zero-knowledge proofs.
- Kakarot is a zkEVM implemented on Starknet using the Cairo language. It simulates the stack, memory, execution, and other aspects of the EVM in the form of Cairo smart contracts. Kakarot faces challenges such as compatibility with the Starknet account system, cost optimization, and stability, as the Cairo language is still in an experimental stage.
- Warp is a converter that transforms Solidity code into Cairo code, providing compatibility at a high-level language level. On the other hand, Kakarot offers compatibility at the EVM level by implementing the opcodes and precompiles of the EVM.
What is a Virtual Machine?
To clarify what a virtual machine is, we must first discuss the execution process of computers under the mainstream von Neumann architecture. Programs running on computers are typically written in high-level languages, which undergo multiple transformations to ultimately generate machine code that the machine can understand. Depending on how the transformation to machine code is done, high-level languages can be roughly divided into compiled languages and interpreted languages.
Compiled languages refer to those that, after the code is written, need to be processed by a compiler to convert high-level language code into machine code, generating executable files. Once compiled, they can be executed multiple times with high efficiency. The advantage of compiled languages is that since the code has already been converted to machine code during compilation, execution speed is fast, and programs can run in environments without a compiler, making it convenient for users without needing to install additional software. Common compiled languages include C, C++, and Go.
In contrast, interpreted languages are those where the code is executed line by line by an interpreter, running directly on the computer, and requiring a re-translation process each time it runs. The advantage of interpreted languages is high development efficiency and ease of debugging, but execution speed is relatively slow. Common interpreted languages include Python, JavaScript, and Ruby.
It is important to emphasize that languages do not fundamentally distinguish between compiled and interpreted; there may be some tendencies in their initial design. C/C++ is mostly compiled, but it can also be interpreted (Cint, Cling). Many traditionally interpreted languages are now compiled into intermediate code that runs on a virtual machine (Python, Lua).
Now that we understand the execution process of physical machines, let's discuss virtual machines.
Virtual machines typically provide a virtual computing environment by simulating different hardware devices. Different virtual machines can simulate different hardware devices, but they usually include CPU, memory, hard disk, network interfaces, etc.
Taking the Ethereum Virtual Machine (EVM) as an example, the EVM is a stack-based virtual machine used to execute Ethereum smart contracts. The EVM provides a virtual computing environment by simulating hardware devices such as CPU, memory, storage, and stack.
Specifically, the EVM is a stack-based virtual machine that uses a stack to store data and execute instructions. The instruction set of the EVM includes various opcodes, such as arithmetic operations, logical operations, storage operations, jump operations, etc. These instructions can be executed on the EVM's stack, thus completing the execution of smart contracts.
The memory and storage simulated by the EVM are devices used to store the state and data of smart contracts. The EVM treats memory and storage as two distinct areas, and it can access the state and data of smart contracts by reading from and writing to memory and storage.
The stack simulated by the EVM is used to store the operands and results of instructions. Most instructions in the EVM's instruction set are stack-based; they read operands from the stack and push results back onto the stack.
In summary, the EVM provides a virtual computing environment by simulating hardware devices such as CPU, memory, storage, and stack. It can execute the instructions of smart contracts and store the state and data of smart contracts. In actual operation, the EVM loads the bytecode of the smart contract into memory and executes the logic of the smart contract by executing the instruction set. What the EVM effectively replaces is the operating system + hardware part shown in the diagram above.
The design process of the EVM is clearly bottom-up, first determining the simulated hardware environment (stack, memory), and then designing its own set of assembly instruction sets (Opcode) and bytecode (Bytecode) based on the corresponding environment. Although the assembly instruction set is meant for human readability, it involves a lot of low-level knowledge, which places high demands on developers and makes development cumbersome. Therefore, high-level languages are needed to abstract away the obscure and complex low-level calls, providing a better experience for developers. Due to the customized design of its assembly instruction set, the EVM is difficult to directly utilize with traditional high-level languages, leading to the creation of a new high-level language to adapt to this virtual machine. The Ethereum community designed two compiled high-level languages for EVM execution efficiency—Solidity and Vyper. Solidity is well-known, while Vyper was designed by Vitalik to address certain flaws in Solidity, but it did not gain significant adoption in the community and gradually faded from the historical stage.
What is zkEVM
In simple terms, zkEVM is an EVM that utilizes zero-knowledge proof/validity proof technology, allowing the execution process of the EVM to be verified more efficiently and at a lower cost through zero-knowledge proofs/validity proofs, without requiring all validators to re-execute the EVM.
There are many zkEVM products on the market, and the competition is fierce. Major players include Starknet, zkSync, Scroll, Taiko, Linea, and Polygon zkEVM (formerly Polygon Hermez), which Vitalik categorized into five types (1, 2, 2.5, 3, 4). For specific content, you can refer to Vitalik's blog.
Why is zkEVM Needed
This question needs to be viewed from two aspects.
The initial zk Rollup attempts could only achieve relatively simple transfer and transaction functions, such as zkSync Lite and Loopring. However, having been accustomed to the Turing-complete EVM on Ethereum, when people could not create diverse applications through programming, they began to call for a virtual machine on L2. The demand for writing smart contracts is one reason.
Due to some designs in the EVM being unfriendly to generating zero-knowledge proofs/validity proofs, some players chose to use instruction sets that are friendly to zero-knowledge proofs/validity proofs at the lower level, such as Starknet's Cairo Assembly and zkSync's Zinc Instruction. However, everyone is also reluctant to give up the vast user ecosystem of the EVM, so they choose to maintain compatibility with the EVM at the upper level, which corresponds to Type 3 and 4 zkEVMs. Some players still insist on the traditional EVM instruction set Opcode, focusing their efforts on generating more efficient proofs for the Opcode, corresponding to Type 1 and 2 zkEVMs. The vast ecosystem of the EVM is the second reason.
Kakarot: A Virtual Machine on a Virtual Machine?
Why can we create another virtual machine on top of a virtual machine? This concept is commonplace for computer professionals, but it may not be so obvious to users unfamiliar with computers. It is actually quite understandable; it is like stacking blocks— as long as the lower layer is solid enough (with a Turing-complete execution environment), you can stack blocks on top without limit. However, no matter how many layers are stacked, the final execution still has to be handled by the underlying physical hardware, so increasing the number of layers can lead to reduced efficiency. Additionally, due to the different designs of the blocks (different virtual machine designs), as the blocks are stacked higher, the likelihood of the blocks collapsing (runtime errors) increases, necessitating a higher level of technical support.
Kakarot is an EVM implemented on Starknet using the Cairo language, simulating the stack, memory, execution, and other aspects of the EVM in the form of Cairo smart contracts. Relatively speaking, implementing the EVM is not particularly difficult; aside from the most widely used Go-Ethereum, which is written in Golang, there are existing EVMs written in Python, Java, JavaScript, and Rust.
The technical challenges of Kakarot zkEVM lie in the fact that the protocol exists as a contract on the Starknet chain, which brings two key issues.
- Compatibility: Starknet uses a completely different account system from Ethereum. In Ethereum, accounts are divided into EOA (Externally Owned Accounts) and CA (Contract Accounts), while Starknet supports native account abstraction, meaning all accounts are contract accounts. Additionally, due to the different cryptographic algorithms used, users cannot generate the same address in Starknet using the same entropy as in Ethereum.
- Cost: Since Kakarot zkEVM exists as a contract on the chain, there are high requirements for code implementation, necessitating optimization towards Gas to reduce interaction costs.
- Stability: Unlike traditional high-level languages like Golang, Rust, and Python, the Cairo language is still in an experimental stage. The official team is continuously modifying language features from Cairo 0 to Cairo 1 and then to Cairo 2 (or Cairo 1 version 2, if you prefer). Meanwhile, the Cairo VM has not undergone sufficient testing, and there is a possibility of large-scale rewrites in the future.
The Kakarot protocol consists of five main components (the GitHub documentation mentions four, excluding EOA; this article has adjusted for reader understanding):
- Kakarot (Core): Responsible for executing transactions in Ethereum format while providing corresponding Starknet accounts for Ethereum users.
- Contract Accounts: Corresponding to CA in Ethereum, responsible for storing contract bytecode and variable states within the contract.
- Externally Owned Accounts: Corresponding to EOA in Ethereum, responsible for forwarding Ethereum transactions to Kakarot Core.
- Account Registry: Stores the correspondence between Ethereum accounts and Starknet accounts.
- Blockhash Registry: Blockhash, as a special Opcode, requires past block data, which Kakarot cannot directly access on-chain. This component stores the mapping of
block_number -> block_hash
, written by administrators, and provided to Kakarot Core.
According to Kakarot CEO Elias Tazartes, in the team's latest version, they abandoned the design of Account Registry in favor of directly using a mapping from a 31-byte Starknet address to a 20-byte EVM address to maintain the correspondence. In the future, to improve interoperability and allow Starknet contracts to register their own EVM addresses, the design of Account Registry may be reused.
Compatibility with EVM on Starknet: What are the Differences Between Warp and Kakarot
According to Vitalik's classification of zkEVM types, Warp belongs to Type-4, while Kakarot currently belongs to Type-2.5.
Warp is a transpiler that converts Solidity code into Cairo code. It is not called a compiler, perhaps because the output Cairo is still a high-level language. Through Warp, Solidity developers can maintain their original development state without needing to learn the new Cairo language. For many projects, Warp lowers the barrier to entry into the Starknet ecosystem, eliminating the need to rewrite large amounts of engineering code in Cairo.
While the idea of transpilation is simple, compatibility is also the worst. Some Solidity code cannot be well translated into Cairo, and code logic involving account systems, cryptographic algorithms, etc., needs to be modified in the source code to complete the migration. Specific unsupported features can be found in the Warp documentation. For example, many projects differentiate between the execution logic of EOA accounts and contract accounts, but in Starknet, all accounts are contract accounts, so this part of the code needs to be modified before it can be transpiled.
Warp provides compatibility at the high-level language level, while Kakarot provides compatibility at the EVM level.
The complete rewriting of the EVM, along with the implementation of each Opcode and Pre-compile, gives Kakarot higher native compatibility. After all, executing within the same virtual machine (EVM) is always more compatible than executing in different virtual machines (Cairo VM). The Account Registry and Blockhash Registry cleverly shield the differences between systems, minimizing migration friction for users.
Kakarot Team
Thanks to the Kakarot team for their valuable feedback on this article, especially to Elias Tazartes. Thank you, sir!