EVM‑IR Legalizer Specification (v0.1)
The Legalizer transforms arbitrary well‑typed EVM‑IR into canonical EVM‑IR, a restricted form required before stackification and code generation.
The legalizer runs after the frontend emits EVM‑IR but before optimization and stackification.
Purpose of the Legalizer
The legalizer ensures:
- All IR obeys canonical control flow rules
- No PHI nodes or block arguments remain
- All control‑flow merges occur through explicit memory operations
- All operations are structurally valid
- No illegal or undefined patterns reach the backend
- Memory and pointer semantics are correct
- Address space usage is valid
- Switch statements are normalized
- Unreachable blocks are removed
- Frame layout constraints are prepared for stackifier
Canonical Form Requirements
Canonical EVM‑IR must satisfy all rules:
- No PHIs (not allowed in EVM‑IR v0.1)
- No block arguments
- No critical edges
- All branches must target explicit basic blocks
- Each block must end with exactly one terminator
- Switch operations must be normalized
- All pointer arithmetic must be explicit
- Composite types must already be lowered
- All unreachable code eliminated
If any rule is violated, the legalizer rewrites the IR.
PHI Elimination Strategy (Memory‑Based)
Because EVM is a stack machine, PHI nodes cannot survive past this phase.
Given:
%x = phi [ %v1, ^bb1 ], [ %v2, ^bb2 ]
The legalizer rewrites as:
%slot = evm.alloca : ptr<0>
^bb1:
evm.mstore %slot, %v1
evm.br ^merge
^bb2:
evm.mstore %slot, %v2
evm.br ^merge
^merge:
%x = evm.mload %slot : u256
All PHIs are eliminated using temporary memory slots.
Switch Normalization
A switch must lower to a sequence of cascaded compares:
evm.switch %v, default ^d
case 0 → ^b0
case 1 → ^b1
Legalizer rewrites to:
%is0 = evm.eq %v, %c0
evm.condbr %is0, ^b0, ^test1
^test1:
%is1 = evm.eq %v, %c1
evm.condbr %is1, ^b1, ^d
Canonical form has no multi-case switch ops.
Control‑Flow Normalization
Rules enforced:
- No critical edges
- All branches become explicit basic blocks
- Empty blocks collapsed unless they serve as jump targets
- Every block must have one terminator, not zero or multiple
Example fix:
^bb:
%c = evm.eq %x, %y
evm.condbr %c, ^bb_true, ^bb_false
evm.mstore %p, %x // illegal after terminator
Legalizer moves trailing ops into a new block:
^bb:
%c = evm.eq %x, %y
evm.condbr %c, ^bb_true, ^bb_fix
^bb_fix:
evm.mstore %p, %x
evm.br ^bb_false
Unreachable Block Removal
Blocks reached only through evm.unreachable or not reached at all are removed.
Algorithm:
- Build reverse postorder
- Mark reachable blocks
- Delete all unmarked blocks
- Clean up dangling branches
Memory and Pointer Verification
The legalizer enforces:
- No pointer arithmetic except through
evm.ptr_add - Memory pointers must be
ptr<0> - Transient storage keys must be
u256(not pointers) - Storage keys must be
u256(not pointers) - Code pointers must use address space 4
- Calldata pointers must not be mutated
Violations are rewritten or rejected.
Composite Type Elimination
Front-end must eliminate structs/arrays before EVM‑IR.
If composites still appear:
❌ Reject with diagnostic
or
✔ Lower via auto‑generated memory layout (frontend recovery mode)
Terminator Repair
Rules:
- Blocks must end in one of:
evm.br,evm.condbr,evm.return,evm.revert,evm.unreachable - If block ends in a non‑terminator op, legalizer inserts:
evm.unreachable
Example repair:
^bb:
evm.mstore %p, %v
becomes:
evm.mstore %p, %v
evm.unreachable
Legalizer Processing Order
- Validate IR invariants
- Eliminate PHI nodes
- Normalize switch ops
- Normalize control flow
- Canonicalize terminators
- Remove unreachable blocks
- Enforce type and pointer invariants
- Re-run canonicalization if necessary
The legalizer may run multiple rounds until a fixpoint is reached.
Examples
Example 1 — PHI removal
Input (pre-legalizer IR with block arguments):
^entry:
%v1 = evm.constant 10 : u256
%v2 = evm.constant 20 : u256
evm.condbr %c, ^a, ^b
^a:
evm.br ^merge(%v1)
^b:
evm.br ^merge(%v2)
^merge(%x : u256):
evm.return %x : u256
Output (canonical form after legalizer):
%slot = evm.alloca : ptr<0>
^entry:
%v1 = evm.constant 10 : u256
%v2 = evm.constant 20 : u256
evm.condbr %c, ^a, ^b
^a:
evm.mstore %slot, %v1 : void
evm.br ^merge
^b:
evm.mstore %slot, %v2 : void
evm.br ^merge
^merge:
%x = evm.mload %slot : u256
evm.return %x : u256
The legalizer eliminates the block argument %x by:
- Allocating a temporary memory slot
- Each predecessor stores its value to the slot
- The merge block loads from the slot
Summary
The legalizer enforces:
- No PHIs
- Fully explicit control-flow
- Canonical memory and pointer operations
- Normalized switch structures
- Safe and deterministic IR for stackifier
This is the final IR form allowed into the EVM‑IR optimization and stackification phases.