MLIR Intermediate Representation
Overview
The Ora compiler includes a comprehensive MLIR (Multi-Level Intermediate Representation) lowering system that provides an intermediate representation for advanced analysis, optimization, and alternative code generation paths. This system lowers to sensei-ir (SIR) for EVM bytecode generation and enables sophisticated compiler transformations.
What is MLIR?
MLIR (Multi-Level Intermediate Representation) is a compiler infrastructure developed by the LLVM community (https://mlir.llvm.org) that provides:
- Multi-level representation: Support for different abstraction levels
- Extensible dialects: Custom operation and type definitions
- Optimization passes: Pluggable transformation framework
- Analysis infrastructure: Data flow and control flow analysis
The Ora compiler integrates with the existing MLIR framework to provide Ora-specific lowering and analysis capabilities.
Ora MLIR Integration
The Ora compiler integrates with the LLVM MLIR framework (https://mlir.llvm.org) to:
- Represent Ora semantics in a structured intermediate form using MLIR operations and types
- Enable advanced analysis for verification and optimization using MLIR's pass infrastructure
- Lower to sensei-ir (SIR) for EVM bytecode generation via the sensei-ir backend
- Provide debugging information with source location preservation using MLIR's location tracking
Our implementation provides Ora-specific dialects, operations, and lowering passes that work within the existing MLIR ecosystem.
Type System Mapping
Primitive Types
Ora primitive types are mapped to MLIR as follows:
| Ora Type | MLIR Type | Notes |
|---|---|---|
u8 - u256 | iN | N-bit unsigned integers |
i8 - i256 | iN | N-bit signed integers |
bool | i1 | Single bit boolean |
address | i160 | With ora.address attribute |
string | !ora.string | Ora dialect type |
bytes | !ora.bytes | Ora dialect type |
void | () | MLIR void type |
Complex Types
| Ora Type | MLIR Type | Description |
|---|---|---|
[T; N] | memref<NxT, space> | Fixed-size arrays with memory space |
slice[T] | !ora.slice<T> | Dynamic slices |
map[K, V] | !ora.map<K, V> | Key-value mappings |
doublemap[K1, K2, V] | !ora.doublemap<K1, K2, V> | Nested mappings |
struct { ... } | !llvm.struct<...> | Struct types |
enum | !ora.enum<name, repr> | Enumeration types |
!T | !ora.error<T> | Error types |
!T1 | T2 | !ora.error_union<T1, T2> | Error unions |
Memory Region Semantics
Ora's memory regions are represented in MLIR using memory spaces:
Storage (Space 1)
%storage_var = memref.alloca() {ora.region = "storage"} : memref<1xi256, 1>
- Persistent contract state
- Transactional semantics
- Gas costs for access
Memory (Space 0)
%memory_var = memref.alloca() : memref<1xi256, 0>
- Transient execution memory
- Cleared between calls
- Lower gas costs
TStore (Space 2)
%tstore_var = memref.alloca() {ora.region = "tstore"} : memref<1xi256, 2>
- Transient storage window
- Temporary persistence
- Intermediate gas costs
Expression Lowering
Arithmetic Operations
Ora arithmetic expressions are lowered to MLIR arith dialect operations:
let result = a + b * c;
%mul = arith.muli %b, %c : i256
%add = arith.addi %a, %mul : i256
Comparison Operations
let is_greater = x > y;
%cmp = arith.cmpi sgt, %x, %y : i256
Logical Operations
Logical operators use short-circuit evaluation:
let result = condition1 && condition2;
%result = scf.if %condition1 -> i1 {
scf.yield %condition2 : i1
} else {
%false = arith.constant false
scf.yield %false : i1
}
Field Access
Struct field access uses LLVM operations:
let value = my_struct.field;
%value = llvm.extractvalue %my_struct[0] : !llvm.struct<(i256, i256)>
Function Calls
let result = my_function(arg1, arg2);
%result = func.call @my_function(%arg1, %arg2) : (i256, i256) -> i256
Statement Lowering
Control Flow
If Statements
if (condition) {
// then block
} else {
// else block
}
scf.if %condition {
// then region
} else {
// else region
}
While Loops
while (condition) {
// body
}
scf.while (%arg0 = %initial) : (i256) -> () {
%cond = // evaluate condition
scf.condition(%cond)
} do {
// loop body
scf.yield %next_value : i256
}
For Loops
for (array) |item| {
// body
}
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%len = // array length
scf.for %i = %c0 to %len step %c1 {
%item = memref.load %array[%i] : memref<?xi256>
// loop body
}
Switch Statements
switch (value) {
case 1 => // handle 1
case 2...5 => // handle range
else => // default case
}
cf.switch %value : i256, [
default: ^default,
1: ^case1,
2: ^case2_5,
3: ^case2_5,
4: ^case2_5,
5: ^case2_5
]
Ora-Specific Features
Move Statements
move 100 from sender to receiver;
%amount = arith.constant 100 : i256
%move_op = ora.move %amount from %sender to %receiver {ora.move = true}
Log Statements
log Transfer(from: sender, to: receiver, amount: value);
ora.log "Transfer"(%sender, %receiver, %value) {
indexed = [true, true, false]
} : (i160, i160, i256) -> ()
Try-Catch Blocks
try {
risky_operation();
} catch (error) {
handle_error(error);
}
%result = scf.execute_region -> !ora.error<()> {
%success = func.call @risky_operation() : () -> !ora.error<()>
scf.yield %success : !ora.error<()>
}
%is_error = ora.is_error %result : !ora.error<()>
scf.if %is_error {
%error = ora.unwrap_error %result : !ora.error<()>
func.call @handle_error(%error) : (()) -> ()
}
Verification Features
Old Expressions
ensures balance == old(balance) + amount;
%old_balance = // captured at function entry
%postcond = arith.cmpi eq, %balance, %expected : i256
ora.assert %postcond {ora.ensures = true, ora.old = %old_balance}
Quantified Expressions
requires forall(i in 0...array.length) array[i] > 0;
%all_positive = ora.forall %i in %range {
%elem = memref.load %array[%i] : memref<?xi256>
%zero = arith.constant 0 : i256
%positive = arith.cmpi sgt, %elem, %zero : i256
ora.yield %positive : i1
} : (index) -> i1
ora.assert %all_positive {ora.requires = true}
Function Contracts
Requires Clauses
fn transfer(amount: u256)
requires(amount > 0)
requires(balance >= amount)
{
// function body
}
func.func @transfer(%amount: i256) {
%zero = arith.constant 0 : i256
%amount_positive = arith.cmpi sgt, %amount, %zero : i256
ora.assert %amount_positive {ora.requires = true}
%balance = // load balance
%sufficient = arith.cmpi uge, %balance, %amount : i256
ora.assert %sufficient {ora.requires = true}
// function body
}
Ensures Clauses
fn transfer(amount: u256)
ensures balance == old(balance) - amount
{
// function body
}
func.func @transfer(%amount: i256) {
%old_balance = memref.load %balance_ref : memref<1xi256>
// function body
%new_balance = memref.load %balance_ref : memref<1xi256>
%expected = arith.subi %old_balance, %amount : i256
%postcond = arith.cmpi eq, %new_balance, %expected : i256
ora.assert %postcond {ora.ensures = true}
}
Compiler Integration
CLI Usage
The Ora compiler uses a modern, streamlined command interface:
Compile and emit bytecode:
ora contract.ora
ora --emit-bytecode contract.ora
View MLIR IR:
ora --emit-mlir contract.ora
View sensei-ir (SIR) intermediate code:
ora --emit-sir contract.ora
Advanced options:
# Disable automatic MLIR validation (not recommended)
ora --no-validate-mlir contract.ora
# Use custom MLIR optimization passes
ora --mlir-passes="canonicalize,cse,mem2reg" contract.ora
Automatic MLIR Validation
The compiler automatically validates MLIR correctness before sensei-ir lowering:
$ ora contract.ora
Parsing contract.ora...
Performing semantic analysis...
Lowering to MLIR...
Validating MLIR before sensei-ir lowering...
✅ MLIR validation passed
Lowering to sensei-ir (SIR)...
Compiling to EVM bytecode via sensei-ir...
Successfully compiled to EVM bytecode!
If validation fails, compilation stops immediately:
❌ MLIR validation failed with 3 error(s):
- [TypeMismatch] Expected i256, found i1
- [MalformedAst] Missing operand for arith.addi
- [InvalidRegion] Block has no terminator
Validation is automatic to catch errors early in the compilation pipeline.
Build Integration
The MLIR lowering pipeline:
- Lexing and Parsing: Source code → AST
- Semantic Analysis: Type checking, builtin validation
- MLIR Lowering: AST → MLIR module (with source locations)
- MLIR Validation: Structural and semantic checks (automatic)
- MLIR Optimization: CSE, canonicalization, mem2reg, SCCP, LICM
- sensei-ir Lowering: MLIR → sensei-ir (SIR) 🚧 In Development
- Bytecode Generation: sensei-ir → EVM bytecode via sensei-ir debug-backend 🚧 In Development
SSA Transformation
Ora to SSA
While Ora allows mutable local variables, the MLIR representation uses Static Single Assignment (SSA) form internally:
Ora Code (Mutable Variables):
pub fn calculate(x: u256) -> u256 {
let result = x; // Mutable local variable
result = result + 10;
result = result * 2;
return result;
}
MLIR Representation (SSA with Memory):
func.func @calculate(%arg0: i256) -> i256 {
%0 = memref.alloca() : memref<1xi256> // Stack allocation
memref.store %arg0, %0[] : memref<1xi256> // Store initial value
%1 = memref.load %0[] : memref<1xi256> // Load for +10
%c10 = arith.constant 10 : i256
%2 = arith.addi %1, %c10 : i256
memref.store %2, %0[] : memref<1xi256> // Store back
%3 = memref.load %0[] : memref<1xi256> // Load for *2
%c2 = arith.constant 2 : i256
%4 = arith.muli %3, %c2 : i256
memref.store %4, %0[] : memref<1xi256> // Store back
%5 = memref.load %0[] : memref<1xi256> // Load final result
return %5 : i256
}
After mem2reg Optimization Pass:
func.func @calculate(%arg0: i256) -> i256 {
%c10 = arith.constant 10 : i256
%0 = arith.addi %arg0, %c10 : i256 // Pure SSA!
%c2 = arith.constant 2 : i256
%1 = arith.muli %0, %c2 : i256 // Pure SSA!
return %1 : i256
}
SSA Benefits
- Optimization: Standard MLIR passes (CSE, SCCP, dead code elimination) work directly
- Analysis: Data flow analysis is straightforward in SSA form
- Type Safety: Each SSA value has a single, well-defined type
- Gas Efficiency:
mem2regeliminates redundant loads/stores
Memory Regions vs SSA
Local Variables (SSA via mem2reg):
- Function-scope variables
memref.alloca→ SSA values- Optimized away in final code
Storage Variables (NOT SSA):
- Contract state (persistent)
ora.sload/ora.sstoreoperations- Direct EVM
SLOAD/SSTOREopcodes - Cannot be optimized away
// Local variable (SSA):
%local = arith.constant 100 : i256
// Storage variable (NOT SSA):
%slot = arith.constant 0 : i256
%value = ora.sload %slot : i256
ora.sstore %slot, %value : i256
Standard Library Integration
Built-in Lowering
Ora's standard library built-ins are lowered to custom ora.evm.* MLIR operations:
Ora Code:
let timestamp = std.block.timestamp();
let sender = std.msg.sender();
MLIR:
%timestamp = ora.evm.timestamp() : i256 loc("contract.ora":10:20)
%sender = ora.evm.caller() : i160 loc("contract.ora":11:17)
sensei-ir (SIR):
fn main:
entry -> timestamp sender {
timestamp = timestamp
sender = caller
iret
}
Zero-Overhead Guarantee
All standard library calls are inlined at the MLIR level - no function call overhead:
| Ora Built-in | MLIR Operation | sensei-ir Operation | Overhead |
|---|---|---|---|
std.block.timestamp() | ora.evm.timestamp | timestamp | Zero |
std.msg.sender() | ora.evm.caller | caller | Zero |
std.constants.U256_MAX | arith.constant -1 | 0xfff...fff | Zero |
Semantic Validation
Built-in calls are validated during semantic analysis:
✅ Valid:
let sender = std.msg.sender(); // Correct usage
❌ Invalid (caught at compile time):
let sender = std.msg.sender; // Error: missing ()
let invalid = std.block.fake(); // Error: unknown built-in
Debugging and Analysis
Source Location Preservation
All MLIR operations preserve exact source location information (97% coverage):
%result = arith.addi %a, %b : i256 loc("contract.ora":15:8)
What's tracked:
- Filename (e.g.,
contract.ora) - Line number (e.g.,
15) - Column number (e.g.,
8)
Use cases:
- Error reporting with exact source position
- Debugging with source-level stepping
- Coverage analysis
- Profiling with source attribution
Error Reporting
MLIR validation provides detailed error messages:
❌ MLIR validation failed with 2 error(s):
- [TypeMismatch] Expected i256, found i1
at contract.ora:42:15
- [MalformedAst] Missing terminator in block
at contract.ora:50:1
Optimization Passes
Standard MLIR passes optimize the IR:
| Pass | Purpose | Benefit |
|---|---|---|
| CSE | Common Subexpression Elimination | Reduce duplicate computations |
| Canonicalization | Simplify IR patterns | Cleaner code |
| mem2reg | Convert memory to SSA values | Eliminate loads/stores |
| SCCP | Sparse Conditional Constant Propagation | Constant folding |
| LICM | Loop-Invariant Code Motion | Hoist invariants |
Result: Significant gas savings in generated EVM bytecode!
Performance Characteristics
The MLIR lowering system is designed for:
- Fast compilation: Efficient lowering algorithms
- Memory efficiency: Minimal overhead during compilation
- Scalability: Handles large smart contracts
- Deterministic output: Consistent results for testing
Implementation Features
Core Features
| Feature | Coverage |
|---|---|
| AST → MLIR Lowering | Complete |
| Source Location Tracking | 97% |
| Automatic Validation | Yes |
| Type System Mapping | Complete |
| Standard Library | 17 built-ins |
| SSA Transformation | mem2reg pass |
| Optimization Passes | 5 passes (CSE, canonicalization, mem2reg, SCCP, LICM) |
| Storage Operations | sload, sstore, tload, tstore |
| Control Flow | if, while, for, switch |
| Function Calls | Complete |
| sensei-ir Lowering | 🚧 In Development |
In Development
- Formal verification integration
- Custom Ora dialect registration
- Advanced type features (generics, constraints)
Future
- Advanced analysis (loop, alias, escape)
- Gas cost modeling
- Alternative backends (LLVM IR, WebAssembly)
- IDE integration
Examples
ERC20 Token Contract
contract SimpleToken {
storage totalSupply: u256;
storage balances: map[address, u256];
storage allowances: doublemap[address, address, u256];
pub fn initialize(initialSupply: u256) -> bool {
let deployer = std.msg.sender();
totalSupply = initialSupply;
balances[deployer] = initialSupply;
return true;
}
pub fn transfer(recipient: address, amount: u256) -> bool {
let sender = std.msg.sender();
let senderBalance = balances[sender];
if (recipient == std.constants.ZERO_ADDRESS) {
return false;
}
if (senderBalance < amount) {
return false;
}
balances[sender] = senderBalance - amount;
let recipientBalance = balances[recipient];
balances[recipient] = recipientBalance + amount;
return true;
}
}
This example demonstrates:
- Zero-overhead built-ins (
std.msg.sender()→CALLERopcode) - Storage operations (
balances[sender]→ora.sloadwith keccak256) - Type safety (all operations type-checked)
- SSA form (local variables via mem2reg)
- Source locations (every operation tagged with source position)
The MLIR representation enables analysis and optimization while maintaining semantic integrity.
MLIR Resources
Learn More About MLIR
- MLIR Official Website: https://mlir.llvm.org
- MLIR Documentation: https://mlir.llvm.org/docs/
- MLIR Tutorials: https://mlir.llvm.org/docs/Tutorials/
- LLVM Project: https://llvm.org
Ora MLIR Implementation
- Source Code: Available in the
src/mlir/directory of the Ora repository - Technical Documentation: See
docs/mlir-lowering.mdfor implementation details - API Reference: See
docs/mlir-api.mdfor developer documentation - Troubleshooting: See
docs/mlir-troubleshooting.mdfor common issues
The Ora MLIR integration leverages the powerful MLIR framework developed by the LLVM community to provide advanced compiler capabilities for smart contract development.