Register Allocator
LCCC replaces CCC’s three-phase greedy allocator with a two-pass linear scan over live intervals.
Background: The CCC Allocator
CCC’s original allocator processes live intervals in three greedy phases:
- Phase 1 — Sort all call-spanning values by priority. Assign callee-saved registers in order until they’re exhausted.
- Phase 2 — Sort non-call-spanning values. Assign caller-saved registers.
- Phase 3 — Assign any remaining callee-saved registers to overflow non-call-spanning values.
The prioritization is loop-weighted: uses inside a loop at depth d contribute 10^d to the score. This is good, but the greedy approach has a fundamental flaw: it can’t evict a lower-priority value to make room for a higher-priority one. Once a register is assigned, it stays assigned.
Additionally, Phase 1 only considers call-spanning values. Non-call-spanning values in tight loops had to compete in Phase 2 for the small caller-saved pool (typically 2–4 registers on x86-64), often missing out entirely.
Linear Scan Overview
Linear scan (Poletto & Sarkar, 1999) processes intervals in order of their start point. For each new interval:
- Expire all active intervals that ended before this one starts. Free their registers.
- Find a free register. If one exists, assign it.
- If no register is free, compare the incoming interval’s spill weight against the active interval with the lowest spill weight. Evict whichever is less important.
This makes a single linear pass with O(n log n) complexity — fast enough to run on every function.
Data Structures
LiveRange (live_range.rs)
pub struct LiveRange {
pub value_id: u32,
pub start: u32, // program point of definition
pub end: u32, // last use
pub uses: Vec<u32>, // individual use points
pub loop_depth: u32, // nesting depth (0 = no loop)
pub priority: u64, // uses.len() * 10^loop_depth
pub reg_hint: Option<PhysReg>, // preferred reg (from Copy source)
pub spill_weight: f64, // priority / range_length
}
Spill weight combines priority and interval length. A short, hot interval costs more to spill than a long, cold one.
LinearScanAllocator
pub struct LinearScanAllocator {
pub ranges: Vec<LiveRange>, // sorted by start
pub active: Vec<ActiveInterval>, // currently live
pub assignments: FxHashMap<u32, PhysReg>,
pub reg_free_until: FxHashMap<PhysReg, u32>,
pub spill_slots: FxHashMap<u32, i32>,
pub available_regs: Vec<PhysReg>,
}
The Allocation Algorithm
run():
init all registers as free (reg_free_until = 0)
for each range in ranges (sorted by start):
allocate_range(range)
allocate_range(range):
expire_old_intervals(range.start) // free registers of ended intervals
if free_register = find_free_register(range):
assign free_register to range
add range to active set
else if spill_candidate = find_spill_candidate():
// Only evict if the candidate has LOWER spill weight than the incoming range.
// This ensures we always keep the most important value in the register.
if candidate.spill_weight < range.spill_weight:
remove candidate from assignments (it goes to stack)
assign candidate's freed register to range
else:
spill incoming range (it goes to stack)
else:
spill incoming range
Eviction Correctness
A subtle but critical fix from the original implementation: the eviction comparison must consider the incoming interval’s weight, not just the active set. The old code always evicted the lowest-weight active interval regardless. This produced wrong results when the incoming interval was less important than everything already in registers — the test case that exposed this:
// Two overlapping intervals, one register
// Value 1: priority=3, spill_weight=0.03 (more important)
// Value 2: priority=2, spill_weight=0.02 (less important, arrives second)
//
// Correct: value 1 keeps the register, value 2 spills
// Old code: always evicts value 1 (the only active), gives reg to value 2 — WRONG
The fix: if active_weight >= incoming_weight, spill the incoming range instead.
Eligibility Filter
The allocator only processes values that can safely live in a general-purpose register. The filter excludes:
| Category | Why excluded |
|---|---|
float, f64, long double |
Use XMM/x87 registers, not GPRs |
i128, u128 |
Require register pairs |
i64/u64 on i686 |
32-bit target, need eax:edx pair |
| Alloca addresses | Must have stable stack addresses |
| Atomic pointers | lock prefix requires memory operand |
memcpy dst/src pointers |
Codegen uses resolve_slot_addr directly |
| VA arg pointers | Same — not register-aware |
CallIndirect func pointers |
Must dereference from memory |
Copy chains are propagated: if %a = copy %b and %b is ineligible, %a is also excluded.
Two-Pass Design
LCCC runs the linear scan twice with different register pools:
Phase 1: LinearScanAllocator(eligible_intervals, callee_saved_regs)
→ callee-saved registers are safe for ALL values, including call-spanning ones
→ the allocator never needs to know whether an interval spans a call
Phase 2: LinearScanAllocator(unallocated_non_call_spanning, caller_saved_regs)
→ caller-saved regs can ONLY hold values that don't cross function calls
→ filter: !spans_any_call(interval, call_points)
This separation preserves ABI correctness: caller-saved registers are freely clobbered by callees, so any value assigned to one must not be live across a call instruction.
Register Hints (Coalescing)
When a value is produced by a Copy instruction, the source and destination have the same value. If the source is allocated to register R, the destination is hinted to prefer R. If R is free when the destination is processed, the allocator assigns it — eliminating a register-to-register move.
The hint infrastructure exists in live_range.rs (find_register_hints, reg_hint field); full coalescing is a Phase 3 improvement.
Performance Impact
| Benchmark | Δ vs CCC | Root cause |
|---|---|---|
arith_loop (32 vars, 10M iters) |
+20% faster | More callee-saved regs allocated; fewer load/store pairs in hot loop |
sieve (primes to 10M) |
+25% faster | Inner loop variables (counters, boundary) now in registers |
fib(40) recursive |
≈ equal | Call-dominated; allocator has minimal effect |
matmul 256×256 double |
≈ equal | FP throughput and GCC’s AVX2 vectorization dominate |
qsort 1M |
≈ equal | libc qsort call dominates — no LCCC code in hot path |