Ethereum Virtual Machine
What is EVM?
You might have heard of Java Virtual Machine, where developers code in Java (or other languages) that compiles down to Java bytecode and JVM is a runtime engine that executes Java bytecode. Ethereum Virtual Machine is similar on a high level, where developers code in Solidity (or other languages like Vyper) and compile down to EVM bytecode and EVM is a runtime engine that executes EVM bytecode.
How does it work?
EVM is a transactional-based state machine — If the transaction failed, the state is not updated, except for the sender’s nonce and eth balance to pay for the gas.
- EVM Code — developer writes the smart contract, compiles it into bytecode, then signs and pushes the transaction on-chain to create the smart contract in EVM.
- When a user or another smart contract interacts with a smart contract, EVM will load the contract bytecode (sequences of opcode) into memory.
- EVM will then execute the sequence of opcodes. While execution is ongoing, the program counter (address of next instruction) will be increased. Gas available will decrease based on which opcode and terminated if insufficient gas is available. Any operation which stores data in storage will persist in account storage (only for the smart contracts).
What is opcode?
There are a total of 141 EVM opcodes that can perform a few categories of tasks (not exhaustive) such as the below. The full list and its associated gas cost can be found at: https://www.evm.codes/
- Load and Save to memory/storage (
mload
|mstore
|sload
|store
) - Process flow (
jump
) - Arithmetic (
add
|mul
|sub
|div
etc…) - Comparison (
eq
|isZero
) - EVM env related (
gas
|caller
)
Example: Solidity -> ByteCode -> OpCode
In this section, we’ll use a simple Solidity code example and walkthrough
- How a contract looks like in ByteCode and opcode
- How EVM knows which opcode to execute for a function call
How a contract looks like in ByteCode and opcode
We will use a simple contract as shown below:
After compiling into ByteCode via solcjs> solcjs -o BytecodeDir --bin src/Demo.sol
608060405234801561001057600080fd5b50610133806100206000396000f3fe6080604052348015600f57600080fd5b506004361060325760003560e01c80633c6bb4361460375780633d4197f0146051575b600080fd5b603d6069565b604051604891906090565b60405180910390f35b606760048036038101906063919060d5565b606f565b005b60005481565b8060008190555050565b6000819050919050565b608a816079565b82525050565b600060208201905060a360008301846083565b92915050565b600080fd5b60b5816079565b811460bf57600080fd5b50565b60008135905060cf8160ae565b92915050565b60006020828403121560e85760e760a9565b5b600060f48482850160c2565b9150509291505056fea26469706673582212202008ce9f466108e302b16a0a7d864882e3925c470c9a7371c7c03dfa5083b56f64736f6c63430008110033
And you can potentially use Etherscan’s tool to convert to opcode (can use solc
command as well)
[0] DUP1
[2] PUSH1 0x40
[3] MSTORE
[4] CALLVALUE
...
When you call a function eg. setVal(uint256)
— your input data will contain both your method signature and input value. For example when you call setVal(10)
the first 4 bytes are 3d4197f0
and the next 32 bytes contain your input value of 10
> abi.encodeWithSignature("setVal(uint256)", 10)0x3d4197f0000000000000000000000000000000000000000000000000000000000000000a
And you can try searching for 3d4197f0
— it's present in the bytecode! This is the first hint on how EVM knows which opcode to execute for a method call 😀
How EVM knows which opcode to execute for a function call
Step 1:
Copy the above Demo.sol
in Remix, compile and deploy. Then call setVal(10)
and press the “Debug” icon, it should bring you to the debugger
You should see that it brings you to this opcode directly which is in the function body, though we are keen on how EVM brought us to this opcode not the opcode for the function execution.
112 DUP1
113 PUSH1 00
115 DUP2
Step 2:
Press step backward until you see this instruction highlighted.
At this point, you can see the method signature is in Stack. EQ (take 2 items out of the stack. And if they are equal, push 1, otherwise 0) is the next opcode to be called. Stack now:
Since both value are equal
0: 0x..01
1: 0x..3d4197f0
Followed by PUSH 51
which push 0x51
to the stack. Stack now:
0: 0x..51
1: 0x..01
2: 0x..3d4197f0
Followed by the key part JUMPI
(pop 2 value out of the stack, proceed to index 0 program counter if index 1 is greater than 0, otherwise, continue).
In our example, since the stack’s index 1 is non-zero, the program counter will proceed to 0x51
which is our method body. And this is how EVM looked at the function signature and proceed to the correct opcode execution.
What if it's not equal in JUMPIO? The program will not jump and eventually reach the REVERT
call at 054
the opcode
Ending note
Understanding how EVM works helps in becoming a better Solidity dev, just like Java devs should have an understanding of JVM. For example, look at the gas for opcode mstore
and sstore
— you would see the gas differences storing in memory vs storage.
I do hope that this post help! In the next post, we would go towards gas optimization for smart contracts and this opcode knowledge should come in handy!
Reference
These articles have helped me tremendously to understand EVM, credit to them!
- https://www.evm.codes/
- https://leftasexercise.com/2021/08/08/q-smart-contracts-and-the-ethereum-virtual-machine/
- https://noxx.substack.com/p/evm-deep-dives-the-path-to-shadowy
New to trading? Try crypto trading bots or copy trading