Introduction

EVM-IR Specification v0.1
Status: Draft
Audience: Compiler engineers, VM architects, language designers

Purpose of This Specification

This document defines EVM-IR, a low-level, SSA-based intermediate representation for compiling high-level smart contract languages to Ethereum Virtual Machine (EVM) bytecode.

EVM-IR is designed to:

Serve as a common backend target for multiple languages (Ora and others)
Provide a precise, analyzable model of EVM-style execution
Enable optimizations and formal reasoning at a level higher than bytecode
Remain EVM-aware without mirroring the EVM’s stack machine structure
Be simple to implement in existing compiler toolchains (e.g., MLIR dialects)

EVM-IR is not a source language, runtime, or bytecode format. It is a compiler IR that sits between high-level language frontends and final EVM code generation.

Design Goals

EVM-IR is guided by the following design goals:

Language-Agnostic

EVM-IR must not embed assumptions from a single language (e.g. Ora, Solidity, Vyper). Any frontend that can lower its semantics to a:

typed, SSA-based control-flow graph
without composite types or high-level exceptions

should be able to target EVM-IR.

EVM-Aware but Not Stack-Shaped

The EVM is a stack machine. EVM-IR is not.

Instead, EVM-IR:

Models EVM concepts explicitly (storage, memory, calldata, transient storage)
Uses SSA values and typed operations instead of implicit stacks
Defers stack concerns to a later stackification phase

This separation makes IR easier to reason about, optimize, and verify.

Canonical SSA Form

EVM-IR is always in a structured SSA form with:

Typed SSA values
Basic blocks with explicit terminators
No critical edges
No unreachable blocks after legalization
No PHI operations in the final canonical form (state merges use memory)

A separate legalizer pass normalizes arbitrary EVM-IR into this canonical form.

Deterministic Lowering to Bytecode

Given a canonical EVM-IR module, lowering to EVM bytecode should be:

Deterministic: no hidden semantics or non-local choices
Predictable: small IR changes produce small codegen changes
Reasonable: preserves obvious performance expectations

To achieve this, the specification defines:

Allowed operations and their semantics
Control-flow and structural invariants
Stackification rules (SSA → stack machine)

Explicit, Minimal Semantics

EVM-IR avoids embedding high-level language features such as:

exceptions
generics
inheritance
traits / interfaces
language-specific error models

Instead, it exposes a minimal set of operations that can encode these features using control flow, memory, storage, and calls.

Non-Goals

EVM-IR intentionally does not attempt to:

Define a new source language or syntax
Replace the Ethereum ABI specification
Define gas pricing or performance guarantees
Encode every EVM quirk as a first-class concept
Serve as a human-friendly assembly language
Model non-EVM backends (e.g., WASM) directly

Language frontends are responsible for:

Type checking and high-level type systems
Ownership / borrowing / capability models
Semantic checks (e.g. overflow policies)
Composite type lowering (structs, arrays, enums)
High-level optimizations

EVM-IR assumes that all high-level decisions have been made before lowering.

Position in the Compilation Pipeline

EVM-IR sits in the middle of the compilation pipeline:

 High-Level Language (Ora, etc.)
        │
        ▼
  Frontend IR / AST / MLIR dialect
        │
 (language-specific lowering)
        ▼
        EVM-IR
        │
   [Legalizer Pass]
        ▼
  Canonical EVM-IR (SSA)
        │
   [Stackifier Pass]
        ▼
   Stack-Oriented IR
        │
   [Backend / Codegen]
        ▼
     EVM Bytecode

Key properties:

Frontends lower their own constructs to EVM-IR, not directly to bytecode.
Legalizer ensures EVM-IR satisfies canonical constraints (no PHI, no composites, normalized CFG).
Stackifier converts SSA into linear, stack-based code suitable for bytecode emission.
Bytecode generator encodes the stack-level program into raw EVM opcodes (out of scope for v0.1).

Module and Function Model

An EVM-IR compilation unit is a module.

A module contains:

Functions: code bodies with typed parameters and return types
Global metadata: debug info, ABI information, target properties
Optionally: declarations of external functions or runtime hooks

Each function:

Has a name (e.g., @transfer or @__entry)
Has a signature: list of parameter types and result types
Contains a non-empty set of basic blocks
Has exactly one entry block
Uses SSA values produced by operations within the function

Basic blocks:

Contain a sequence of operations
End with a terminator (branch, conditional branch, return, revert, etc.)
May have zero or more predecessors
Have no implicit control-flow edges

Canonical IR Requirements (High-Level)

While the detailed canonicalization rules are specified in the Legalizer section, the following high-level requirements hold for canonical EVM-IR:

SSA Form
Each value is assigned exactly once and used any number of times. No mutable variables; state is represented via memory, storage, or transient storage.
No PHI Nodes in Final Form
IR may temporarily use PHI-like constructs (MLIR block arguments, etc.),
but the legalizer must ultimately lower all control-flow merges to explicit memory operations (e.g., stores in predecessors, loads in merge blocks).
Explicit Control Flow
Every block ends with a terminator. No fall-through without an explicit branch.
No Composite Types
Structs, tuples, arrays, and maps are not first-class types in EVM-IR. They must be lowered to primitive values and memory operations.
Well-Formed CFG
No unreachable blocks. No critical edges that violate canonicalization rules.
Well-Typed Operations
Every operation must satisfy type rules defined in the Operations section. Implicit conversions are not allowed.

EVM Awareness and Address Spaces

Although EVM-IR is not a stack machine, it is explicitly aware of the distinct EVM address spaces:

Memory: transient per-call, word-addressable, zero-initialized
Storage: persistent contract key-value store
Calldata: read-only call input buffer
Transient Storage: EIP-1153 temporary key-value store, cleared at end of transaction
Code: read-only contract code region for EXTCODE* operations

These are expressed via typed pointers with address space identifiers, rather than via raw numeric offsets alone. This allows:

static verification of address space usage
more robust lowering to EVM opcodes
analysis and optimization passes to reason about memory vs. storage vs. transient vs. calldata

Relationship With Frontends (e.g., Ora)

EVM-IR is intentionally not tied to the Ora language, but Ora is a primary expected frontend.

Frontends are responsible for:

Introducing EVM-IR functions that implement their language functions
Lowering language-level constructs (errors, results, enums, structs, etc.) into memory and control flow
Providing ABI metadata used by the ABI lowering stage
Injecting verification-friendly patterns if desired

The IR itself does not know about:

Ora’s refinement types
Ora’s proof obligations
Language-level error unions

Those are erased or encoded into EVM-IR primitives before or during lowering.

ABI and Entry Semantics (High-Level View)

EVM-IR supports:

A special entry dispatcher function that:
- examines calldata
- extracts the 4-byte selector
- routes execution to the appropriate function body
- handles unknown selectors and ETH-only transfers
Per-function ABI decode logic that:
- reads argument words from calldata
- decodes static types into SSA values
- decodes dynamic ABI types into memory regions
Per-function ABI encode logic that:
- writes return values to memory in ABI layout
- returns a pointer + length pair via return / revert

Details are provided in the ABI Lowering section, but the introduction notes that EVM-IR is designed to act as the point where ABI obligations become explicit.

Debug Information and Tooling

EVM-IR defines a minimal, language-agnostic debug model:

Source locations (file, line, column)
Variable metadata (name, type, location)
Scope information (function/block scopes)

The specification also describes how to export this information into a format compatible with the ethdebug/format project, allowing integration with:

debuggers
tracers
on-chain analysis tools

Details are provided in the Debug Information section.

Stackification and Backend

EVM-IR is not directly executable. It must be lowered to:

Canonical EVM-IR (SSA) — after legalization
Stack-Oriented IR — via the stackifier
EVM bytecode — via a backend encoder (out of scope for v0.1)

The stackifier is responsible for:

choosing frame layouts for stack and memory
scheduling instructions according to stack constraints
introducing DUP/SWAP operations when necessary
ensuring efficient code generation

The backend then maps Stack IR to raw opcodes.

Document Structure

This specification is split into the following standalone sections:

types.md — Type system (primitive types, pointers, address spaces)
ops.md — Operations (semantics, type rules, constraints)
legalizer.md — Canonical form, legalizer rules, CFG normalization
stackifier.md — SSA → stack machine transformation
debug.md — Debug metadata and ethdebug integration
abi-lowering.md — ABI decoding, encoding, dispatcher, fallback/receive
examples.md — Worked examples of complete lowering pipelines
appendix.md — Reference tables, glossary, and auxiliary material

Each file can be read independently, but together they describe the full EVM-IR model.

Purpose of This Specification​

Design Goals​

Language-Agnostic​

EVM-Aware but Not Stack-Shaped​

Canonical SSA Form​

Deterministic Lowering to Bytecode​

Explicit, Minimal Semantics​

Non-Goals​

Position in the Compilation Pipeline​

Module and Function Model​

Canonical IR Requirements (High-Level)​

EVM Awareness and Address Spaces​

Relationship With Frontends (e.g., Ora)​

ABI and Entry Semantics (High-Level View)​

Debug Information and Tooling​

Stackification and Backend​

Document Structure​