Arena allocator programming in Rust for interpreters and compilers

You are building a Lox interpreter. Right now your parser builds an AST with many Box values and many owned String values. That is a good beginner design because it is clear. But when an interpreter or compiler grows, this design starts to hurt.

This page teaches arena allocator programming from first principles, with zero external crates. We will start from the reason arenas exist, then build multiple arena styles in Rust, then apply the idea to your Lox interpreter.

The word is usually spelled arena allocator. I will use arena in the article.

The short idea

An arena is a place where many objects are allocated together and freed together.

Instead of doing this:

let left = Box::new(Expr::Number(1.0));
let right = Box::new(Expr::Number(2.0));
let expr = Box::new(Expr::Binary { left, right });

You do this:

let left = arena.alloc(Expr::Number(1.0));
let right = arena.alloc(Expr::Number(2.0));
let expr = arena.alloc(Expr::Binary { left, right });

Or, even more common in Rust compilers:

let left = ast.exprs.alloc(Expr::Number(1.0));
let right = ast.exprs.alloc(Expr::Number(2.0));
let expr = ast.exprs.alloc(Expr::Binary { left, right });

Where left, right, and expr are small IDs, not Box pointers.

First principles: what problem are we solving?

When you parse source code, you create many small objects:

tokens
expressions
statements
declarations
types
scopes
symbols
errors
compiler intermediate representation nodes

For a small program, this is not a problem. For a real compiler, it can be millions of objects.

The normal heap allocator is general-purpose. General-purpose means it must support many patterns:

allocate one object
free that object later
allocate different sizes
keep track of each allocation
avoid memory corruption
work across many parts of the program

That generality has a cost.

For an AST, we often do not need that generality. Most AST nodes have one simple lifetime:

create during parsing -> use during analysis/interpreting/codegen -> free all at once

That lifetime is perfect for an arena.

The compiler memory pattern

Most compiler phases look like this:

source text
    │
    ▼
lexer creates tokens
    │
    ▼
parser creates AST nodes
    │
    ▼
resolver/type checker reads AST, creates symbols/types
    │
    ▼
interpreter or code generator walks data
    │
    ▼
drop whole phase memory together

Notice something important: we usually do not free a single AST node. We free the whole AST when the program or REPL input is done.

That is the main reason arenas are popular in compilers.

Why Box everywhere becomes expensive

Your current parser has code like this:

pub enum Expr {
    Grouping(Box<Expr>),
    Assignment {
        id: Identifier,
        assign: Box<Expr>,
    },
    UnaryOperation {
        op: UnaryOperator,
        operand: Box<Expr>,
    },
    BinaryOperation {
        left: Box<Expr>,
        op: BinaryOperator,
        right: Box<Expr>,
    },
}

This is valid Rust. It is also a common first AST design.

But every Box means one heap allocation. For this expression:

1 + 2 * 3 - 4

The parser creates multiple nodes. Many of them become separate heap allocations.

Box(Binary -)
├─ Box(Binary +)
│  ├─ Number(1)
│  └─ Box(Binary *)
│     ├─ Number(2)
│     └─ Number(3)
└─ Number(4)

The costs are:

many calls to the global allocator
many small allocations
memory spread around the heap
more pointer chasing
less CPU cache friendliness
harder ownership when later compiler phases want references into the AST

CPU cache friendliness means the CPU is faster when related data is close together in memory.

What arena allocation changes

With an arena, AST nodes are stored in a few big containers.

Arena memory
╭────────────────────────────────────────╮
│ Expr 0 │ Expr 1 │ Expr 2 │ Expr 3 │ ... │
╰────────────────────────────────────────╯

ExprId(3) points to Expr 3

Instead of each node owning children with Box, nodes can point to children with IDs or references.

pub struct ExprId(usize);

pub enum Expr {
    Number(f32),
    Binary {
        left: ExprId,
        op: BinaryOperator,
        right: ExprId,
    },
}

Now the AST is a graph of small IDs into an arena.

Why this is faster

Arena allocation is usually faster for compiler data because:

Allocation is simple.
Deallocation is almost free.
Data is more compact.
We avoid many Box allocations.
We avoid cloning big AST values.

Allocation can be as simple as pushing into a Vec:

self.items.push(value);
ExprId(self.items.len() - 1)

Deallocation happens when the whole Vec is dropped.

drop(ast);

That is the whole cleanup.

Important warning: arena is not magic

Arena allocation is powerful, but it is not always the correct tool.

Use an arena when:

many objects have the same lifetime
you rarely delete individual objects
you build graph-like data such as AST, HIR, IR, symbol tables
you care about allocation overhead

Do not use an arena when:

objects must be deleted one by one
memory must be returned immediately
objects have very different lifetimes
simple Vec or Box is already enough

The main thinking model is this:

An arena trades precise freeing for simple and fast bulk freeing.

Precise freeing means freeing each object exactly when it is no longer used. Bulk freeing means freeing everything together.

Three arena styles you should know

For Rust interpreters and compilers, three styles matter most.

╭────────────────────╮
│ 1. Index arena      │  returns ExprId, StmtId, DeclId
╰────────────────────╯
╭────────────────────╮
│ 2. Typed ref arena  │  returns &'arena Expr
╰────────────────────╯
╭────────────────────╮
│ 3. Bump arena       │  low-level bytes, fastest, unsafe core
╰────────────────────╯

For your current Lox interpreter, I recommend starting with the index arena.

Why? Because it is safe Rust, easy to debug, easy to serialize, and easy to use in later compiler passes.

Part 1: a simple index arena

An index arena stores values in a Vec and returns typed IDs.

Start simple:

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Id<T> {
    index: usize,
    _marker: std::marker::PhantomData<fn() -> T>,
}

impl<T> Id<T> {
    pub fn index(self) -> usize {
        self.index
    }
}

#[derive(Debug, Default)]
pub struct Arena<T> {
    items: Vec<T>,
}

impl<T> Arena<T> {
    pub fn new() -> Self {
        Self { items: Vec::new() }
    }

    pub fn alloc(&mut self, value: T) -> Id<T> {
        let id = Id {
            index: self.items.len(),
            _marker: std::marker::PhantomData,
        };
        self.items.push(value);
        id
    }

    pub fn get(&self, id: Id<T>) -> &T {
        &self.items[id.index]
    }

    pub fn get_mut(&mut self, id: Id<T>) -> &mut T {
        &mut self.items[id.index]
    }

    pub fn len(&self) -> usize {
        self.items.len()
    }

    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }
}

The PhantomData field tells Rust that Id<T> is tied to the type T. This prevents mixing Expr IDs and Stmt IDs by accident.

Without PhantomData, this could happen:

let expr_id = ExprId(0);
let stmt = stmts.get(expr_id); // wrong arena, but same usize shape

With typed IDs, Rust helps stop that mistake.

How the index arena works

Arena<Expr>
╭───────╬────────────────────────────╮
│ index │ value                      │
├───────┼────────────────────────────┤
│ 0     │ Number(1)                  │
│ 1     │ Number(2)                  │
│ 2     │ Binary { left: 0, right:1 } │
╰───────╩────────────────────────────╯

ExprId { index: 2 } means expression at row 2

The arena owns the values. The ID is just a small handle.

Handle means a small value used to find a real value somewhere else.

Why indices are idiomatic for many Rust compilers

References are nice, but Rust references have lifetimes. Lifetimes are useful, but they can make compiler data structures harder.

IDs avoid many lifetime problems.

With IDs, you can store graphs easily:

pub enum Expr {
    Number(f32),
    Binary {
        left: ExprId,
        op: BinaryOperator,
        right: ExprId,
    },
}

No Box. No recursive ownership. No lifetime parameter on Expr.

That is a big ergonomic win.

Ergonomic means comfortable to use.

Part 2: redesigning your AST with IDs

Your current AST owns child expressions with Box. We can redesign it like this.

pub type ExprId = Id<Expr>;
pub type StmtId = Id<Stmt>;
pub type DeclId = Id<Declaration>;

#[derive(Debug, Clone)]
pub enum Expr {
    Identifier(Identifier),
    String(String),
    Number(f32),
    Bool(bool),
    Nil,

    Grouping(ExprId),
    Assignment {
        id: Identifier,
        assign: ExprId,
    },
    UnaryOperation {
        op: UnaryOperator,
        operand: ExprId,
    },
    BinaryOperation {
        left: ExprId,
        op: BinaryOperator,
        right: ExprId,
    },
}

#[derive(Debug, Clone)]
pub enum Stmt {
    Block(Vec<DeclId>),
    PrintStmt(ExprId),
    ExprStmt(ExprId),
}

#[derive(Debug, Clone)]
pub struct VarDecl {
    pub identifier: Identifier,
    pub expr: Option<ExprId>,
}

#[derive(Debug, Clone)]
pub enum Declaration {
    VarDecl(VarDecl),
    Stmt(StmtId),
}

#[derive(Debug)]
pub struct Program {
    pub declarations: Vec<DeclId>,
}

#[derive(Debug, Default)]
pub struct Ast {
    pub exprs: Arena<Expr>,
    pub stmts: Arena<Stmt>,
    pub decls: Arena<Declaration>,
    pub program: Program,
}

Now Program does not own declarations directly. It owns IDs into Ast.decls.

The whole parsed result is Ast.

Ast
├─ exprs: Arena<Expr>
├─ stmts: Arena<Stmt>
├─ decls: Arena<Declaration>
└─ program: Vec<DeclId>

Program -> DeclId -> Declaration -> StmtId -> Stmt -> ExprId -> Expr

This is the core pattern.

Why this fixes cloning in your interpreter

Your current interpreter does this:

program
    .declarations
    .iter()
    .try_for_each(|decl| self.eval_decl(decl.clone()))

That clone exists because eval_decl takes ownership of Declaration.

With IDs, the interpreter can borrow from the AST:

fn eval_program(&mut self, ast: &Ast) -> Result<(), RuntimeError> {
    for decl_id in &ast.program.declarations {
        self.eval_decl(ast, *decl_id)?;
    }
    Ok(())
}

fn eval_decl(&mut self, ast: &Ast, decl_id: DeclId) -> Result<(), RuntimeError> {
    match ast.decls.get(decl_id) {
        Declaration::VarDecl(var_decl) => self.eval_var_decl(ast, var_decl),
        Declaration::Stmt(stmt_id) => self.eval_stmt(ast, *stmt_id),
    }
}

Now eval_decl receives a small ID. It does not clone the AST node.

Part 3: parser shape with an index arena

Your parser can own an Ast while parsing.

pub struct Parser<I: Iterator<Item = Result<Token, LexError>>> {
    tokens: std::iter::Peekable<I>,
    ast: Ast,
}

impl<I: Iterator<Item = Result<Token, LexError>>> Parser<I> {
    pub fn new(tokens: I) -> Self {
        Self {
            tokens: tokens.peekable(),
            ast: Ast::default(),
        }
    }

    pub fn parse(mut self) -> Result<Ast, ParseError> {
        let program = self.program()?;
        self.ast.program = program;
        Ok(self.ast)
    }
}

Notice parse now consumes self. This is useful because the parser owns the Ast and returns it when done.

Allocating expressions

You add small helper methods:

impl<I: Iterator<Item = Result<Token, LexError>>> Parser<I> {
    fn alloc_expr(&mut self, expr: Expr) -> ExprId {
        self.ast.exprs.alloc(expr)
    }

    fn alloc_stmt(&mut self, stmt: Stmt) -> StmtId {
        self.ast.stmts.alloc(stmt)
    }

    fn alloc_decl(&mut self, decl: Declaration) -> DeclId {
        self.ast.decls.alloc(decl)
    }
}

Now every parser function returns IDs.

Before:

fn expr(&mut self) -> Result<Expr, ParseError>

After:

fn expr(&mut self) -> Result<ExprId, ParseError>

Example: parsing binary expressions

Your current parser builds a nested Expr value.

Arena version:

fn term(&mut self) -> Result<ExprId, ParseError> {
    let mut node = self.factor()?;

    while let Some(tk_type) = self.peek_tktype()? {
        let op = match tk_type {
            TokenType::Plus => BinaryOperator::Add,
            TokenType::Minus => BinaryOperator::Sub,
            _ => break,
        };

        self.advance()?;
        let right = self.factor()?;

        node = self.alloc_expr(Expr::BinaryOperation {
            left: node,
            op,
            right,
        });
    }

    Ok(node)
}

The important line is:

node = self.alloc_expr(Expr::BinaryOperation { left: node, op, right });

Instead of boxing the old node, we store it in the arena and point to it by ID.

Example: parsing primary expressions

fn primary(&mut self) -> Result<ExprId, ParseError> {
    let token = self.advance()?;

    let expr = match token.tk_type {
        TokenType::Identifier(name) => Expr::Identifier(Identifier { name }),
        TokenType::String(value) => Expr::String(value),
        TokenType::Number(value) => Expr::Number(value),
        TokenType::True => Expr::Bool(true),
        TokenType::False => Expr::Bool(false),
        TokenType::Nil => Expr::Nil,
        TokenType::LeftParen => {
            let inner = self.expr()?;
            self.consume_if_matches(TokenType::RightParen)?;
            Expr::Grouping(inner)
        }
        _ => return Err(ParseError::UnexpectedToken(token)),
    };

    Ok(self.alloc_expr(expr))
}

All expression parser functions now return ExprId.

Example: parsing statements and declarations

fn declaration(&mut self) -> Result<DeclId, ParseError> {
    if let Some(TokenType::Var) = self.peek_tktype()? {
        self.consume_if_matches(TokenType::Var)?;
        let var_decl = self.var_decl()?;
        return Ok(self.alloc_decl(Declaration::VarDecl(var_decl)));
    }

    let stmt_id = self.stmt()?;
    Ok(self.alloc_decl(Declaration::Stmt(stmt_id)))
}

fn stmt(&mut self) -> Result<StmtId, ParseError> {
    if let Some(TokenType::LeftBrace) = self.peek_tktype()? {
        self.consume_if_matches(TokenType::LeftBrace)?;

        let mut declarations = Vec::new();
        while let Some(tk_type) = self.peek_tktype()? {
            if tk_type == TokenType::RightBrace {
                break;
            }
            declarations.push(self.declaration()?);
        }

        self.consume_if_matches(TokenType::RightBrace)?;
        return Ok(self.alloc_stmt(Stmt::Block(declarations)));
    }

    if let Some(TokenType::Print) = self.peek_tktype()? {
        self.consume_if_matches(TokenType::Print)?;
        let expr = self.expr()?;
        self.consume_if_matches(TokenType::Semicolon)?;
        return Ok(self.alloc_stmt(Stmt::PrintStmt(expr)));
    }

    let expr = self.expr()?;
    self.consume_if_matches(TokenType::Semicolon)?;
    Ok(self.alloc_stmt(Stmt::ExprStmt(expr)))
}

Part 4: interpreter with arena IDs

The interpreter now receives Ast instead of Program.

impl Interpreter {
    pub fn interpret(&mut self, ast: &Ast) -> Result<(), RuntimeError> {
        self.eval_program(ast)
    }

    fn eval_program(&mut self, ast: &Ast) -> Result<(), RuntimeError> {
        for decl_id in &ast.program.declarations {
            self.eval_decl(ast, *decl_id)?;
        }
        Ok(())
    }
}

The AST owns nodes. The interpreter only reads them.

eval_expr with IDs

fn eval_expr(&mut self, ast: &Ast, expr_id: ExprId) -> Result<RuntimeValue, RuntimeError> {
    match ast.exprs.get(expr_id) {
        Expr::Identifier(id) => {
            self.global_scope
                .get(&id.name)
                .cloned()
                .ok_or_else(|| RuntimeError::VariableDoesNotExist(id.name.clone()))
        }
        Expr::String(value) => Ok(RuntimeValue::String(value.clone())),
        Expr::Number(value) => Ok(RuntimeValue::Number(*value)),
        Expr::Bool(value) => Ok(RuntimeValue::Bool(*value)),
        Expr::Nil => Ok(RuntimeValue::Nil),
        Expr::Grouping(inner) => self.eval_expr(ast, *inner),
        Expr::Assignment { id, assign } => {
            if self.global_scope.contains_key(&id.name) {
                let value = self.eval_expr(ast, *assign)?;
                self.global_scope.insert(id.name.clone(), value.clone());
                Ok(value)
            } else {
                Err(RuntimeError::VariableDoesNotExist(id.name.clone()))
            }
        }
        Expr::UnaryOperation { op, operand } => {
            let operand = self.eval_expr(ast, *operand)?;
            self.eval_unary(op.clone(), operand)
        }
        Expr::BinaryOperation { left, op, right } => {
            let left = self.eval_expr(ast, *left)?;
            let right = self.eval_expr(ast, *right)?;
            self.eval_binary(left, op.clone(), right)
        }
    }
}

This still clones strings and operators in a few places. That is okay at first. Later, you can reduce those clones too.

main.rs after arena AST

Your file mode becomes:

let tokens: Vec<Result<Token, _>> = Lexer::new(&src_code).collect();

let parser = Parser::new(tokens.into_iter());
let ast = parser.parse().unwrap();

let mut interpreter = Interpreter::new();
interpreter.interpret(&ast).unwrap();

In the REPL:

let parser = Parser::new(tokens.into_iter());
let ast = parser.parse().unwrap();
interpreter.interpret(&ast).unwrap();

Each input line gets its own AST arena. After interpret returns, that arena can drop.

This is safe if runtime values do not store references into the AST.

Part 5: the lifetime rule of arenas

Arena data must live longer than anything pointing into it.

With IDs, this is simple because IDs are just indexes. But you still must pass the Ast when reading the ID.

fn eval_expr(ast: &Ast, id: ExprId) { ... }

The ID alone is not useful without the arena.

This is good. It makes ownership clear.

ExprId alone
    │
    ▼
just a number, cannot read node

ExprId + &Ast
    │
    ▼
can read ast.exprs[id]

A common mistake

Do not store ExprId from one Ast and use it with another Ast.

let ast1 = parse("1 + 2");
let id = ast1.program.declarations[0];

let ast2 = parse("3 + 4");
ast2.decls.get(id); // wrong meaning

Typed IDs stop mixing ExprId and StmtId. They do not stop mixing two different Ast values.

For most interpreters, this is fine because you pass IDs around together with the Ast they came from.

Part 6: typed reference arena

An index arena is safe and practical. But many compilers also use arenas that return references.

Example target API:

let expr: &Expr = arena.alloc(Expr::Number(1.0));

This feels nice because child pointers are normal references.

pub enum Expr<'a> {
    Number(f32),
    Binary {
        left: &'a Expr<'a>,
        op: BinaryOperator,
        right: &'a Expr<'a>,
    },
}

But in Rust this requires care.

Why a fully safe reference arena is hard

This simple version looks nice:

pub struct Arena<T> {
    items: Vec<T>,
}

impl<T> Arena<T> {
    pub fn alloc(&mut self, value: T) -> &T {
        self.items.push(value);
        self.items.last().unwrap()
    }
}

But it has a problem:

let a = arena.alloc(Expr::Number(1.0));
let b = arena.alloc(Expr::Number(2.0)); // borrow checker problem
println!("{:?}", a);

The returned reference keeps the arena borrowed. Rust will not let you mutably borrow the arena again while a is alive.

Also, Vec can move its items when it grows, which would break old references.

A typed arena using Box and small unsafe

One simple reference arena stores Box<T>. Box values have stable addresses. Moving the Box handle does not move the T inside the Box.

use std::cell::RefCell;

pub struct TypedArena<T> {
    items: RefCell<Vec<Box<T>>>,
}

impl<T> TypedArena<T> {
    pub fn new() -> Self {
        Self {
            items: RefCell::new(Vec::new()),
        }
    }

    pub fn alloc(&self, value: T) -> &T {
        let boxed = Box::new(value);
        let ptr: *const T = &*boxed;
        self.items.borrow_mut().push(boxed);

        // Safety rule:
        // - ptr points into a Box<T>
        // - the Box<T> is stored in self.items
        // - items are only dropped when the arena is dropped
        // - we never move the T out of the Box
        unsafe { &*ptr }
    }
}

This uses unsafe, but the unsafe part is small and easy to reason about.

Unsafe means Rust cannot prove the code is correct, so you must prove it by design.

When to use typed reference arenas

Use a typed reference arena when:

you want AST nodes to contain references
your AST has one clear lifetime
you want simple recursive tree walking
you accept some unsafe inside the arena implementation

Avoid it when:

you want fully safe code
you need to mutate nodes a lot
you want to store nodes across different arena lifetimes
IDs would be easier

For your current Lox interpreter, I still recommend the ID arena first.

Part 7: bump arena

A bump arena is the low-level version.

It has a big byte buffer and a cursor.

Allocation means:

align the cursor
write the value at that memory
move the cursor forward

chunk bytes
╭────────────────────────────────────────────╮
│ used values │ cursor -> free free free free │
╰────────────────────────────────────────────╯

alloc T:
cursor = align(cursor, align_of::<T>())
write T at cursor
cursor += size_of::<T>()

This can be extremely fast. But implementing a general bump arena correctly requires unsafe code, alignment handling, and drop handling.

Alignment means some types must start at memory addresses divisible by 2, 4, 8, or more. If you ignore alignment, the program can be wrong or crash.

A learning bump allocator sketch

This sketch shows the idea. It is not the first implementation I recommend shipping.

use std::alloc::{alloc, dealloc, Layout};
use std::cell::Cell;
use std::mem;
use std::ptr;

pub struct Bump {
    ptr: *mut u8,
    cap: usize,
    pos: Cell<usize>,
}

impl Bump {
    pub fn with_capacity(cap: usize) -> Self {
        let layout = Layout::from_size_align(cap, 16).unwrap();
        let ptr = unsafe { alloc(layout) };
        if ptr.is_null() {
            std::alloc::handle_alloc_error(layout);
        }
        Self { ptr, cap, pos: Cell::new(0) }
    }

    pub fn alloc<T>(&self, value: T) -> &T {
        let align = mem::align_of::<T>();
        let size = mem::size_of::<T>();
        let start = align_up(self.pos.get(), align);
        let end = start.checked_add(size).expect("arena overflow");

        assert!(end <= self.cap, "bump arena out of memory");

        let dst = unsafe { self.ptr.add(start) as *mut T };
        unsafe { ptr::write(dst, value) };
        self.pos.set(end);
        unsafe { &*dst }
    }
}

fn align_up(value: usize, align: usize) -> usize {
    (value + align - 1) & !(align - 1)
}

impl Drop for Bump {
    fn drop(&mut self) {
        let layout = Layout::from_size_align(self.cap, 16).unwrap();
        unsafe { dealloc(self.ptr, layout) };
    }
}

This code has a serious limitation: it does not run destructors for allocated T values.

Destructor means code that runs when a value is dropped. String has a destructor because it must free its internal buffer.

So this bump allocator is only safe for values that do not need Drop, unless you also track destructors.

That is why crates like bumpalo exist. You can use them later, but you should understand the idea first.

Part 8: strings in compiler arenas

Your lexer currently allocates many String values:

pub enum TokenType {
    Identifier(String),
    String(String),
    Number(f32),
}

pub struct Token {
    pub tk_type: TokenType,
    pub lexeme: String,
    pub line: u32,
}

This is easy, but it copies text out of the source string.

A compiler often stores slices into the original source:

pub struct Token<'src> {
    pub kind: TokenKind<'src>,
    pub lexeme: &'src str,
    pub line: u32,
}

pub enum TokenKind<'src> {
    Identifier(&'src str),
    String(&'src str),
    Number(f32),
}

Now identifiers do not allocate a new String during lexing.

The source string must live longer than the tokens and AST that borrow from it.

source String
    ▲
    │ borrowed by
tokens and AST names

For a file, this is easy. read_src_file returns a String, then lexer and parser borrow from it.

For a REPL, each input line has a short lifetime. If runtime values need names or strings after the line ends, you may need to store owned Strings or interned strings.

String interning

String interning means storing each unique string once and using an ID for it.

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Symbol(usize);

pub struct Interner {
    strings: Vec<String>,
    map: std::collections::HashMap<String, Symbol>,
}

Then identifier names are Symbol instead of String.

pub struct Identifier {
    pub name: Symbol,
}

This is very common in compilers.

Part 9: typed arena vs index arena

Here is the practical difference.

Index arena

enum Expr {
    Binary { left: ExprId, right: ExprId }
}

fn eval(ast: &Ast, id: ExprId) { ... }

Pros:

safe Rust
easy to store in structs
easy to compare, copy, hash
good for graphs
good for later compiler passes

Cons:

every access needs ast.exprs.get(id)
IDs can be used with the wrong Ast if you are careless

Typed reference arena

enum Expr<'a> {
    Binary { left: &'a Expr<'a>, right: &'a Expr<'a> }
}

Pros:

tree walking looks simple
fewer get calls
direct references

Cons:

lifetimes spread through many types
implementation usually needs unsafe
mutation is harder
graphs can be harder

Recommendation for your Lox project

Use index arenas first.

They will teach you the compiler architecture pattern without fighting lifetimes too early.

Part 10: modern compiler usage

Modern compilers use arenas because compiler data often lives by phase.

Examples:

Rust compiler uses arenas and interners for compiler-owned data such as high-level representations.
Clang stores AST nodes in an ASTContext, which acts like a long-lived arena for the translation unit.
LLVM has bump pointer allocators for many internal compiler data structures.
Many language servers and analyzers use index-based arenas for syntax trees and semantic data.

Translation unit means one compiled source file plus the files it includes or imports.

The common pattern is:

╭──────────────╮
│ parse arena  │ owns syntax tree nodes
╰──────┬───────╯
       ▼
╭──────────────╮
│ type arena   │ owns type objects
╰──────┬───────╯
       ▼
╭──────────────╮
│ IR arena     │ owns intermediate representation nodes
╰──────────────╯

Each arena has a clear lifetime.

Part 11: engineering rules for arenas

Follow these rules:

Give each arena a clear owner.
Do not hide global mutable arenas everywhere.
Prefer typed IDs over plain usize.
Keep IDs small and Copy.
Do not delete from a simple index arena unless you understand invalid IDs.
Pass &Ast to functions that use IDs.
Keep arena lifetimes aligned with compiler phases.
Start safe with index arenas, then learn unsafe bump arenas later.

About deletion

If you remove an item from Vec, all later indexes shift.

items.remove(3); // index 4 becomes index 3

That breaks old IDs.

So a simple arena should not remove nodes.

If you need deletion, use slots:

Vec<Option<T>>

Or use a generational arena.

Generational means an ID stores both index and version, so old IDs can be detected after deletion.

You do not need this for your AST.

Part 12: how to migrate your Lox interpreter step by step

Do not rewrite everything at once.

Step 1: add arena.rs

Create a new module:

// arena.rs
use std::marker::PhantomData;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Id<T> {
    index: usize,
    _marker: PhantomData<fn() -> T>,
}

#[derive(Debug, Default)]
pub struct Arena<T> {
    items: Vec<T>,
}

impl<T> Arena<T> {
    pub fn alloc(&mut self, value: T) -> Id<T> {
        let id = Id { index: self.items.len(), _marker: PhantomData };
        self.items.push(value);
        id
    }

    pub fn get(&self, id: Id<T>) -> &T {
        &self.items[id.index]
    }

    pub fn get_mut(&mut self, id: Id<T>) -> &mut T {
        &mut self.items[id.index]
    }
}

Then in main.rs:

mod arena;

Step 2: change Expr children from Box<Expr> to ExprId

Do only expressions first.

Change parser expression functions to return ExprId.

Keep statements and declarations normal if needed. You can migrate gradually.

Step 3: add Ast wrapper

pub struct Ast {
    pub exprs: Arena<Expr>,
    pub program: Program,
}

Later add stmts and decls arenas.

Step 4: change interpreter eval_expr

fn eval_expr(&mut self, ast: &Ast, expr: ExprId) -> Result<RuntimeValue, RuntimeError>

Now all recursive expression eval uses IDs.

Step 5: move Stmt and Declaration into arenas

After expression migration works, migrate statements and declarations.

Step 6: remove unnecessary Clone usage

Your interpreter should walk by ID and borrow from Ast. Avoid cloning whole declarations and expressions.

Part 13: best final shape for your Lox project

A clean structure:

// parser.rs
use crate::arena::{Arena, Id};

pub type ExprId = Id<Expr>;
pub type StmtId = Id<Stmt>;
pub type DeclId = Id<Declaration>;

pub struct Ast {
    pub exprs: Arena<Expr>,
    pub stmts: Arena<Stmt>,
    pub decls: Arena<Declaration>,
    pub program: Program,
}

pub struct Program {
    pub declarations: Vec<DeclId>,
}

Interpreter:

impl Interpreter {
    pub fn interpret(&mut self, ast: &Ast) -> Result<(), RuntimeError> {
        for decl in &ast.program.declarations {
            self.eval_decl(ast, *decl)?;
        }
        Ok(())
    }
}

main.rs:

let parser = Parser::new(tokens.into_iter());
let ast = parser.parse().unwrap();
interpreter.interpret(&ast).unwrap();

Part 14: thinking like a compiler engineer

When deciding how to allocate data, ask these questions:

Who owns this data?
How long does it live?
Do I delete items one by one?
Do many things point to it?
Do I need mutation after creation?
Is this phase-local or global?

For AST nodes:

owner: Ast
lifetime: one parse result
deletion: no
many pointers: yes
mutation: little or none after parsing
phase: parse plus later read-only phases

That is arena-friendly.

For runtime variables:

owner: interpreter environment
lifetime: dynamic during program execution
deletion: yes, scopes end
mutation: yes

That is usually not the same arena as AST.

Part 15: common mistakes

Mistake 1: putting everything in one arena

Do not create one giant arena for every object in your interpreter.

Better:

Ast owns AST nodes
Interner owns strings or symbols
Interpreter owns runtime environments
Type checker owns type data

Mistake 2: using arena to avoid understanding ownership

Arena does not remove ownership. It changes ownership.

The arena owns everything.

Mistake 3: deleting from index arena

Do not remove AST nodes from Vec. IDs would break.

Mistake 4: storing references into short-lived source text

If tokens borrow from REPL input, do not store those borrowed strings inside long-lived runtime values.

Mistake 5: starting with unsafe bump allocation too early

Learn index arenas first. They solve most of your current problem.

Part 16: what I would do in your exact interpreter

I would change your project in this order:

Add arena.rs with Arena<T> and Id<T>.
Change Expr to use ExprId instead of Box<Expr>.
Make Parser own Ast.
Make expression parser functions return ExprId.
Make interpreter eval_expr take &Ast and ExprId.
Then move Stmt and Declaration into arenas.
Then reduce clones in interpreter.
Later, consider token string slices or a string interner.

Do not start with a bump allocator. Start with an index arena.

Part 17: mental model summary

An arena is not just a faster allocator. It is a design decision.

You are saying:

These objects belong together. They are born together. They die together.

That statement matches compilers very well.

Without arena
╭─────╮ ╭─────╮ ╭─────╮ ╭─────╮
│ Box │ │ Box │ │ Box │ │ Box │ many small heap objects
╰─────╯ ╰─────╯ ╰─────╯ ╰─────╯

With arena
╭────────────────────────────────╮
│ node node node node node node   │ one owned region
╰────────────────────────────────╯

For Rust, the best first arena is usually not a raw memory allocator. It is a typed index arena built on Vec.

Part 18: a deeper performance model

It is easy to say "arena is faster". But you should know exactly why.

Performance usually improves because of these causes:

Fewer allocator calls.
Fewer pointer indirections.
Better memory locality.
Cheaper cleanup.
Better data layout for compiler passes.

Let us explain each one slowly.

Fewer allocator calls

The global allocator is powerful. It can allocate and free many sizes from many parts of the program. But each call has overhead.

Overhead means extra work around the real work.

When you do this many times:

Box::new(expr)

you ask the allocator for one new heap object each time.

With an index arena:

self.exprs.push(expr)

Vec sometimes grows, but not on every push. Most pushes only write into already reserved memory.

Box design
parse node -> allocator call
parse node -> allocator call
parse node -> allocator call
parse node -> allocator call

Arena Vec design
reserve chunk -> push node
             -> push node
             -> push node
             -> push node

Better memory locality

Locality means related data is near each other in memory.

An AST traversal often visits many neighboring nodes. If the nodes live near each other, the CPU cache helps.

Bad locality
Expr A ───────▶ heap object far away
Expr B ─────────────▶ another far heap object
Expr C ───▶ another random place

Good locality
Expr Arena
╭──────────────────────────────╮
│ Expr A │ Expr B │ Expr C │ ... │
╰──────────────────────────────╯

This does not mean every arena program is automatically faster. It means the memory shape is usually better for compiler workloads.

Cheaper cleanup

With Box, each Box is dropped separately. Dropping one Box may call the allocator to free memory.

With Vec arena, the Vec drops all values and frees its backing buffer.

For AST nodes, that matches the real need:

When done with parse result, drop the whole parse result.

Performance is not only allocation speed

In compilers, speed is often affected by walking data again and again.

Examples:

resolver walks AST
type checker walks AST
interpreter walks AST
optimizer walks IR
code generator walks IR

If the data layout is better, all those passes can benefit.

Pass means one stage that reads or changes compiler data.

Part 19: the complete expression migration shape

Earlier we showed term and primary. Now let us convert the whole expression side of your parser.

The target signatures:

fn expr(&mut self) -> Result<ExprId, ParseError>;
fn assignment(&mut self) -> Result<ExprId, ParseError>;
fn equality(&mut self) -> Result<ExprId, ParseError>;
fn comparision(&mut self) -> Result<ExprId, ParseError>;
fn term(&mut self) -> Result<ExprId, ParseError>;
fn factor(&mut self) -> Result<ExprId, ParseError>;
fn unary(&mut self) -> Result<ExprId, ParseError>;
fn primary(&mut self) -> Result<ExprId, ParseError>;

assignment

Assignment needs one special thing. You parse the left side first, then check if it is an identifier.

With IDs, you cannot directly match the ExprId. You must look it up in the arena.

fn assignment(&mut self) -> Result<ExprId, ParseError> {
    let expr = self.equality()?;

    if let Some(TokenType::Equal) = self.peek_tktype()? {
        self.consume_if_matches(TokenType::Equal)?;
        let value = self.assignment()?;

        let id = match self.ast.exprs.get(expr) {
            Expr::Identifier(id) => id.clone(),
            _ => return Err(ParseError::WrongAssignment),
        };

        return Ok(self.alloc_expr(Expr::Assignment {
            id,
            assign: value,
        }));
    }

    Ok(expr)
}

This is a common arena pattern:

have ID -> inspect node in arena -> allocate new node using old ID

equality

fn equality(&mut self) -> Result<ExprId, ParseError> {
    let mut node = self.comparision()?;

    while let Some(tk_type) = self.peek_tktype()? {
        let op = match tk_type {
            TokenType::BangEqual => BinaryOperator::NotEqual,
            TokenType::EqualEqual => BinaryOperator::DoubleEqual,
            _ => break,
        };

        self.advance()?;
        let right = self.comparision()?;

        node = self.alloc_expr(Expr::BinaryOperation {
            left: node,
            op,
            right,
        });
    }

    Ok(node)
}

comparison

Your code spells it comparision. That is okay for now, but the common spelling is comparison.

fn comparision(&mut self) -> Result<ExprId, ParseError> {
    let mut node = self.term()?;

    while let Some(tk_type) = self.peek_tktype()? {
        let op = match tk_type {
            TokenType::Greater => BinaryOperator::Greater,
            TokenType::GreaterEqual => BinaryOperator::GreaterEqual,
            TokenType::Less => BinaryOperator::Less,
            TokenType::LessEqual => BinaryOperator::LessEqual,
            _ => break,
        };

        self.advance()?;
        let right = self.term()?;

        node = self.alloc_expr(Expr::BinaryOperation {
            left: node,
            op,
            right,
        });
    }

    Ok(node)
}

factor

fn factor(&mut self) -> Result<ExprId, ParseError> {
    let mut node = self.unary()?;

    while let Some(tk_type) = self.peek_tktype()? {
        let op = match tk_type {
            TokenType::Star => BinaryOperator::Mul,
            TokenType::Slash => BinaryOperator::Div,
            _ => break,
        };

        self.advance()?;
        let right = self.unary()?;

        node = self.alloc_expr(Expr::BinaryOperation {
            left: node,
            op,
            right,
        });
    }

    Ok(node)
}

unary

fn unary(&mut self) -> Result<ExprId, ParseError> {
    if let Some(tk_type) = self.peek_tktype()? {
        match tk_type {
            TokenType::Bang => {
                self.advance()?;
                let operand = self.unary()?;
                Ok(self.alloc_expr(Expr::UnaryOperation {
                    op: UnaryOperator::Not,
                    operand,
                }))
            }
            TokenType::Minus => {
                self.advance()?;
                let operand = self.unary()?;
                Ok(self.alloc_expr(Expr::UnaryOperation {
                    op: UnaryOperator::Minus,
                    operand,
                }))
            }
            _ => self.primary(),
        }
    } else {
        Err(ParseError::UnexpectedEof)
    }
}

Notice that recursion still exists. Arena allocation does not remove recursion. It changes where recursive child nodes are stored.

Part 20: full AST walking with IDs

Once you use IDs, you will write many functions like this:

fn walk_expr(ast: &Ast, expr: ExprId) {
    match ast.exprs.get(expr) {
        Expr::Number(_) => {}
        Expr::String(_) => {}
        Expr::Bool(_) => {}
        Expr::Nil => {}
        Expr::Identifier(_) => {}
        Expr::Grouping(inner) => walk_expr(ast, *inner),
        Expr::Assignment { assign, .. } => walk_expr(ast, *assign),
        Expr::UnaryOperation { operand, .. } => walk_expr(ast, *operand),
        Expr::BinaryOperation { left, right, .. } => {
            walk_expr(ast, *left);
            walk_expr(ast, *right);
        }
    }
}

This is the basic visitor pattern.

Visitor means code that walks a tree and does something at each node.

You can make visitors for:

printing AST
resolving variables
type checking
interpreting
compiling
collecting diagnostics

Diagnostic means an error or warning message for the programmer.

Pretty-printing an arena AST

This is useful for debugging.

fn print_expr(ast: &Ast, expr: ExprId) -> String {
    match ast.exprs.get(expr) {
        Expr::Number(n) => n.to_string(),
        Expr::String(s) => format!("\"{}\"", s),
        Expr::Bool(b) => b.to_string(),
        Expr::Nil => "nil".to_string(),
        Expr::Identifier(id) => id.name.clone(),
        Expr::Grouping(inner) => format!("(group {})", print_expr(ast, *inner)),
        Expr::UnaryOperation { op, operand } => {
            format!("({:?} {})", op, print_expr(ast, *operand))
        }
        Expr::BinaryOperation { left, op, right } => {
            format!(
                "({:?} {} {})",
                op,
                print_expr(ast, *left),
                print_expr(ast, *right),
            )
        }
        Expr::Assignment { id, assign } => {
            format!("(assign {} {})", id.name, print_expr(ast, *assign))
        }
    }
}

This is also a good test that your arena AST is linked correctly.

Part 21: spans and source locations

Real compilers need good error messages. For that, AST nodes often store spans.

Span means a range in the source code.

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct Span {
    pub start: usize,
    pub end: usize,
    pub line: u32,
}

Then your token stores a span:

pub struct Token {
    pub tk_type: TokenType,
    pub lexeme: String,
    pub span: Span,
}

And your AST nodes can store spans too.

pub struct ExprNode {
    pub kind: Expr,
    pub span: Span,
}

pub type ExprId = Id<ExprNode>;

This is often better than putting Span inside every enum variant.

ExprNode
╭─────────────────────╮
│ span: source range   │
│ kind: Expr enum      │
╰─────────────────────╯

Now every expression has a source location.

Why spans matter with arenas

If your AST uses IDs, diagnostics can store IDs or spans.

For example:

pub struct Diagnostic {
    pub span: Span,
    pub message: String,
}

When type checking finds an error, it can use the node span:

let expr = ast.exprs.get(expr_id);
diagnostics.push(Diagnostic {
    span: expr.span,
    message: "expected number".to_string(),
});

This is how compilers connect internal data back to source code.

Part 22: separating node data from node kind

For a serious compiler, this shape is common:

pub struct ExprNode {
    pub kind: ExprKind,
    pub span: Span,
}

pub enum ExprKind {
    Identifier(Identifier),
    Number(f32),
    Binary {
        left: ExprId,
        op: BinaryOperator,
        right: ExprId,
    },
}

Why is this nice?

every node has common metadata
enum stays focused on syntax meaning
passes can access span for any expression
later you can add node IDs, types, or flags

Metadata means data about data. Span is metadata about where the node came from.

You can do the same for statements:

pub struct StmtNode {
    pub kind: StmtKind,
    pub span: Span,
}

pub enum StmtKind {
    Block(Vec<DeclId>),
    Print(ExprId),
    Expr(ExprId),
}

This is usually better than a giant enum with no common wrapper.

Part 23: a string interner with zero dependencies

Earlier we introduced string interning. Now let us implement it.

use std::collections::HashMap;

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Symbol(usize);

#[derive(Debug, Default)]
pub struct Interner {
    strings: Vec<String>,
    map: HashMap<String, Symbol>,
}

impl Interner {
    pub fn intern(&mut self, value: &str) -> Symbol {
        if let Some(symbol) = self.map.get(value) {
            return *symbol;
        }

        let symbol = Symbol(self.strings.len());
        self.strings.push(value.to_string());
        self.map.insert(value.to_string(), symbol);
        symbol
    }

    pub fn resolve(&self, symbol: Symbol) -> &str {
        &self.strings[symbol.0]
    }
}

Now Identifier can be:

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
pub struct Identifier {
    pub name: Symbol,
}

Your runtime environment can use Symbol as the key:

pub struct Interpreter {
    global_scope: HashMap<Symbol, RuntimeValue>,
}

This makes variable lookup cheaper than String lookup in many cases.

Where should the interner live?

For your interpreter, a good owner is a CompileSession.

pub struct CompileSession {
    pub interner: Interner,
}

Then parser can borrow the interner mutably:

pub struct Parser<'a, I> {
    tokens: std::iter::Peekable<I>,
    ast: Ast,
    interner: &'a mut Interner,
}

But be careful. If your lexer already creates String tokens, you can intern in the parser. If your lexer returns borrowed &str tokens, you can intern there or in the parser.

Interning keywords?

Usually keywords are TokenType variants, not interned strings.

TokenType::If
TokenType::While
TokenType::Identifier(Symbol)

Intern user-defined names:

variable names
function names
class names
property names

Part 24: REPL design with arenas

Your main.rs has two modes:

file mode
REPL mode

REPL means read-eval-print-loop. It reads one line, evaluates it, prints result, then loops.

Arena lifetime in file mode is simple:

read source file
    │
    ▼
parse into Ast arena
    │
    ▼
interpret Ast
    │
    ▼
drop Ast at end

REPL mode is more subtle.

input line 1 -> Ast arena 1 -> interpret -> drop arena 1
input line 2 -> Ast arena 2 -> interpret -> drop arena 2
input line 3 -> Ast arena 3 -> interpret -> drop arena 3

The interpreter state lives across inputs:

let mut interpreter = Interpreter::new();

loop {
    let ast = parse_line(...);
    interpreter.interpret(&ast).unwrap();
}

This is okay only if the interpreter does not store references into ast.

Runtime values should own their data or use stable IDs from a long-lived interner.

For example, this is dangerous:

pub enum RuntimeValue<'ast> {
    Function(&'ast FunctionDecl),
}

Because the AST for one REPL input may be dropped, but the function value could remain in the environment.

For a Lox interpreter with functions, you must think carefully about function declarations and closures.

Closure means a function value that remembers variables from its surrounding scope.

A practical REPL rule

For simple expression and variable Lox, drop each input AST after interpreting.

For functions/classes that must live after the input line, store runtime-owned function objects, not references to short-lived AST nodes.

Or use a long-lived REPL arena for declarations that become runtime functions.

Short-lived parse arena
    used for normal statements

Long-lived REPL definition arena
    stores function/class declarations that survive between inputs

This is advanced, but important.

Part 25: environments are not AST arenas

Your Interpreter currently has:

pub struct Interpreter {
    global_scope: HashMap<String, RuntimeValue>,
    local_scopes: Option<Box<Interpreter>>,
}

You probably do not want to put runtime scopes in the same arena as AST nodes.

AST data is mostly immutable after parsing.

Runtime scopes are dynamic:

created when entering a block or function
mutated by variable assignment
dropped when leaving scope

That is a different lifetime.

A better environment shape might be:

pub struct Environment {
    values: HashMap<Symbol, RuntimeValue>,
    parent: Option<EnvId>,
}

pub type EnvId = Id<Environment>;

pub struct EnvArena {
    envs: Arena<Environment>,
}

But this is optional. For a tree-walk interpreter, Rc<RefCell<Environment>> is also common.

No external crate is needed:

use std::cell::RefCell;
use std::rc::Rc;

type EnvRef = Rc<RefCell<Environment>>;

pub struct Environment {
    values: HashMap<Symbol, RuntimeValue>,
    parent: Option<EnvRef>,
}

Rc means reference-counted pointer. RefCell means runtime-checked mutable borrow. These are standard library tools.

For learning compilers, it is okay to use Rc<RefCell> for environments and arena IDs for AST.

Part 26: mutable compiler passes with index arenas

Sometimes a pass needs to mutate AST nodes.

Example: constant folding.

Constant folding means replacing constant expressions with their computed value.

1 + 2  ->  3

With an index arena, you can mutate a node:

fn fold_expr(ast: &mut Ast, expr: ExprId) {
    let replacement = match ast.exprs.get(expr) {
        Expr::BinaryOperation { left, op, right } => {
            let left_id = *left;
            let right_id = *right;
            fold_expr(ast, left_id);
            fold_expr(ast, right_id);

            match (ast.exprs.get(left_id), op, ast.exprs.get(right_id)) {
                (Expr::Number(a), BinaryOperator::Add, Expr::Number(b)) => Some(Expr::Number(a + b)),
                _ => None,
            }
        }
        _ => None,
    };

    if let Some(new_expr) = replacement {
        *ast.exprs.get_mut(expr) = new_expr;
    }
}

This example is simplified, but the pattern matters:

Read child IDs.
Drop immutable borrow.
Recurse with mutable Ast.
Mutate current node.

Rust will force you to avoid holding immutable and mutable borrows at the same time.

That can feel annoying, but it protects you.

Alternative: create a new arena for transformed AST

Many compilers do not mutate old trees. They create a new tree.

AST arena -> lowering pass -> HIR arena -> type pass -> typed HIR arena

Lowering means converting one representation into a simpler or more explicit representation.

Example:

for loop -> while loop
syntactic sugar -> core language

This is often cleaner than mutating the old AST.

Part 27: HIR, MIR, IR and why arenas keep appearing

Compilers often use several internal languages.

AST: close to source syntax
HIR: high-level intermediate representation
MIR: middle-level intermediate representation
IR: intermediate representation, a general term

Intermediate means between source code and final execution/codegen.

Example:

source code
   ▼
AST: keeps syntax shape
   ▼
HIR: names resolved, sugar removed
   ▼
MIR: control flow clearer
   ▼
bytecode or machine code

Each representation can have its own arena.

Why not one arena for everything?

Because each phase has different data and lifetime.

pub struct Compiler {
    source: String,
    interner: Interner,
    ast: Ast,
    hir: Hir,
    types: TypeStore,
}

This is a clean compiler mental model.

Part 28: building a TypeStore arena

When you add static types later, you may store types in an arena too.

pub type TypeId = Id<Type>;

pub enum Type {
    Number,
    String,
    Bool,
    Nil,
    Function {
        params: Vec<TypeId>,
        return_type: TypeId,
    },
    Class {
        name: Symbol,
    },
}

pub struct TypeStore {
    types: Arena<Type>,
}

Then type checking expressions can return TypeId:

fn check_expr(&mut self, ast: &Ast, expr: ExprId) -> Result<TypeId, TypeError> {
    match ast.exprs.get(expr) {
        Expr::Number(_) => Ok(self.types.number()),
        Expr::String(_) => Ok(self.types.string()),
        Expr::Bool(_) => Ok(self.types.bool()),
        _ => todo!(),
    }
}

For built-in types, you may store fixed IDs:

pub struct TypeStore {
    types: Arena<Type>,
    number: TypeId,
    string: TypeId,
    bool_: TypeId,
    nil: TypeId,
}

This is another arena pattern: allocate canonical objects once and reuse their IDs.

Canonical means the one official shared version.

Part 29: generational arenas

A simple index arena cannot safely delete items. But some programs need deletion.

For that, a generational arena stores an index and a generation.

Generation means version number.

#[derive(Clone, Copy, PartialEq, Eq, Hash)]
pub struct GenId {
    index: usize,
    generation: u32,
}

struct Slot<T> {
    generation: u32,
    value: Option<T>,
}

When you delete an item, you increment the generation.

old id: index 2, generation 0

delete slot 2 -> generation becomes 1

new id: index 2, generation 1

old id no longer matches

Sketch:

pub struct GenArena<T> {
    slots: Vec<Slot<T>>,
    free: Vec<usize>,
}

impl<T> GenArena<T> {
    pub fn get(&self, id: GenId) -> Option<&T> {
        let slot = self.slots.get(id.index)?;
        if slot.generation != id.generation {
            return None;
        }
        slot.value.as_ref()
    }
}

You do not need generational arenas for AST nodes because AST nodes are not deleted.

But it is useful knowledge for runtime objects, editor buffers, ECS systems, and long-lived graphs.

ECS means entity component system, common in game engines.

Part 30: drop handling in bump arenas

The earlier bump allocator did not drop values. That is a big limitation.

If you allocate String in a bump arena and never run its destructor, the String internal heap memory leaks.

Leak means memory is not freed until the program ends.

To support Drop, a bump arena can store cleanup functions.

Conceptually:

struct DropRecord {
    ptr: *mut u8,
    drop_fn: unsafe fn(*mut u8),
}

When allocating T:

unsafe fn drop_value<T>(ptr: *mut u8) {
    std::ptr::drop_in_place(ptr as *mut T);
}

Store:

self.drops.push(DropRecord {
    ptr: dst as *mut u8,
    drop_fn: drop_value::<T>,
});

When the arena drops:

for record in self.drops.iter().rev() {
    unsafe { (record.drop_fn)(record.ptr) };
}

Then deallocate the raw byte chunks.

This is why production bump allocators are non-trivial.

Non-trivial means not simple.

Why reverse drop order?

Rust usually drops values in reverse creation order. If later objects reference earlier objects, reverse drop is often safer.

But with arena references, you should avoid destructors that depend on other arena objects still being alive. Keep arena-stored compiler nodes simple when possible.

Part 31: chunked bump arenas

A fixed-size bump arena can run out of memory.

A production bump arena usually stores chunks.

Chunk 1: full
╭────────────────────╮
│ used used used used │
╰────────────────────╯

Chunk 2: active
╭────────────────────╮
│ used used free free │
╰────────────────────╯

Chunk 3: allocated later if needed

When the active chunk is full, allocate a new chunk.

Large objects may get their own chunk.

This design avoids moving old allocations, so old references stay valid.

That is one reason bump arenas can return references safely from an API, even though the internal implementation uses unsafe.

Part 32: comparing allocation choices for your Lox AST

Here is a practical table.

╭─────────────────┬────────────┬───────────────┬────────────────────╮
│ Design          │ Safe Rust? │ Easy walking? │ Best for           │
├─────────────────┼────────────┼───────────────┼────────────────────┤
│ Box AST         │ yes        │ yes           │ learning first AST │
│ Rc AST          │ yes        │ medium        │ shared ownership   │
│ Index arena     │ yes        │ medium        │ compilers in Rust  │
│ Ref arena       │ unsafe core│ yes           │ phase trees        │
│ Raw bump arena  │ unsafe core│ advanced      │ low-level speed    │
╰─────────────────┴────────────┴───────────────┴────────────────────╯

For your current level and project, index arena is the sweet spot.

Sweet spot means best balance.

Part 33: a realistic module layout

As your interpreter grows, organize like this:

src/
├─ main.rs
├─ arena.rs          // Id<T>, Arena<T>
├─ interner.rs       // Symbol, Interner
├─ span.rs           // Span, source locations
├─ lexer.rs          // tokens
├─ parser.rs         // Ast, ExprId, StmtId, parser
├─ interpreter.rs    // runtime execution
└─ error.rs          // errors and diagnostics

This separates the concepts.

arena.rs should not know about Expr.

parser.rs should use Arena<ExprNode>.

interpreter.rs should not allocate AST nodes.

This separation makes the project easier to reason about.

Part 34: a more complete arena.rs

Here is a better zero-dependency arena module.

use std::marker::PhantomData;
use std::ops::{Index, IndexMut};

#[derive(Debug, PartialEq, Eq, Hash)]
pub struct Id<T> {
    index: u32,
    _marker: PhantomData<fn() -> T>,
}

impl<T> Copy for Id<T> {}

impl<T> Clone for Id<T> {
    fn clone(&self) -> Self {
        *self
    }
}

impl<T> Id<T> {
    pub fn new(index: usize) -> Self {
        assert!(index <= u32::MAX as usize, "too many arena items");
        Self {
            index: index as u32,
            _marker: PhantomData,
        }
    }

    pub fn index(self) -> usize {
        self.index as usize
    }
}

#[derive(Debug, Default)]
pub struct Arena<T> {
    items: Vec<T>,
}

impl<T> Arena<T> {
    pub fn new() -> Self {
        Self { items: Vec::new() }
    }

    pub fn with_capacity(capacity: usize) -> Self {
        Self { items: Vec::with_capacity(capacity) }
    }

    pub fn alloc(&mut self, value: T) -> Id<T> {
        let id = Id::new(self.items.len());
        self.items.push(value);
        id
    }

    pub fn get(&self, id: Id<T>) -> &T {
        &self.items[id.index()]
    }

    pub fn get_mut(&mut self, id: Id<T>) -> &mut T {
        &mut self.items[id.index()]
    }

    pub fn iter(&self) -> impl Iterator<Item = (Id<T>, &T)> {
        self.items
            .iter()
            .enumerate()
            .map(|(index, value)| (Id::new(index), value))
    }

    pub fn len(&self) -> usize {
        self.items.len()
    }

    pub fn is_empty(&self) -> bool {
        self.items.is_empty()
    }
}

impl<T> Index<Id<T>> for Arena<T> {
    type Output = T;

    fn index(&self, id: Id<T>) -> &Self::Output {
        self.get(id)
    }
}

impl<T> IndexMut<Id<T>> for Arena<T> {
    fn index_mut(&mut self, id: Id<T>) -> &mut Self::Output {
        self.get_mut(id)
    }
}

Now you can write:

let expr = &ast.exprs[expr_id];

That is more readable.

Why u32 instead of usize? Many compilers use smaller IDs to reduce memory. u32 is enough for about 4 billion nodes. Your interpreter will not hit that.

Part 35: common Rust borrow problems and fixes

With arenas, you may hit borrow checker errors like:

cannot borrow ast.exprs as mutable because it is also borrowed as immutable

This often happens when you do:

let expr = ast.exprs.get(expr_id);
let new_id = ast.exprs.alloc(...); // mutable borrow while expr is alive
println!("{:?}", expr);

Fix it by copying out the IDs or data you need first:

let (left, right) = match ast.exprs.get(expr_id) {
    Expr::BinaryOperation { left, right, .. } => (*left, *right),
    _ => return,
};

let new_id = ast.exprs.alloc(...);

The immutable borrow ended before the mutable borrow started.

This style is common in Rust compiler code.

Part 36: arena allocation and error recovery

Parser error recovery means continuing after a parse error to find more errors.

Recovery can create partial AST nodes.

Partial means incomplete.

In an arena parser, this is okay. You can allocate error nodes.

pub enum ExprKind {
    Error,
    Number(f32),
    Binary { left: ExprId, op: BinaryOperator, right: ExprId },
}

Then parser can return an ExprId even after an error:

fn error_expr(&mut self, span: Span) -> ExprId {
    self.ast.exprs.alloc(ExprNode {
        kind: ExprKind::Error,
        span,
    })
}

This makes later passes simpler because they can keep walking the tree.

Type checker can say:

ExprKind::Error => TypeId::ERROR

This is a real compiler technique.

Part 37: real compiler patterns in simple words

Clang style

Clang has an ASTContext. You can think of it as a large owner for AST-related data for one translation unit.

Simple mental model:

ASTContext
├─ owns declarations
├─ owns statements
├─ owns expressions
├─ owns types
└─ frees them together

This is arena thinking.

LLVM style

LLVM has bump pointer allocators in many places. A bump pointer allocator is a bump arena.

It is used when allocation is frequent and individual free is not needed.

Rust compiler style

The Rust compiler uses arenas, IDs, interners, and context objects. The exact internals are complex, but the mental model is the same:

store compiler data in long-lived contexts
refer to many things by IDs or interned references
avoid allocating and freeing tiny objects one by one
organize data by compiler phase

You do not need to copy rustc design exactly. You should copy the thinking model.

Part 38: benchmarking your arena migration

Do not guess performance. Measure it.

Zero-dependency simple timing:

use std::time::Instant;

let start = Instant::now();
let ast = parser.parse().unwrap();
let parse_time = start.elapsed();

println!("parse: {:?}", parse_time);
println!("expr nodes: {}", ast.exprs.len());

Things to measure:

parse time
number of expression nodes
number of statement nodes
number of declaration nodes
interpreter time
total allocations if you use an allocator profiler later

Run many times. One run can lie because the OS and CPU are noisy.

Noisy means affected by random outside things.

For a small Lox interpreter, the arena version may not look much faster at first. That is normal. The design becomes more valuable as programs and compiler phases grow.

Part 39: exercises to become confident

Do these in order.

Exercise 1

Implement Arena<T> and Id<T>. Write a test:

let mut arena = Arena::new();
let a = arena.alloc(10);
let b = arena.alloc(20);
assert_eq!(*arena.get(a), 10);
assert_eq!(*arena.get(b), 20);

Exercise 2

Convert only Expr to ExprId. Keep Stmt and Declaration unchanged.

Goal: make expression parsing and evaluation work.

Exercise 3

Convert Stmt to StmtId.

Goal: blocks store Vec<DeclId> or Vec<Declaration> temporarily.

Exercise 4

Convert Declaration to DeclId.

Goal: Program stores Vec<DeclId>.

Exercise 5

Add Span to ExprNode.

Goal: runtime and parse errors can point to source locations.

Exercise 6

Add an Interner and change Identifier name from String to Symbol.

Goal: variable lookup uses Symbol.

Exercise 7

Write a pretty-printer that prints any ExprId.

Goal: debug the arena AST.

Exercise 8

Add constant folding as a compiler pass.

Goal: learn mutable arena passes.

Part 40: what expert-level understanding feels like

You are becoming expert when you stop asking only:

How do I allocate this object?

And start asking:

What region owns this data, what phase uses it, and when does the whole region die?

That is the arena mindset.

Compiler engineers think in regions:

source region
token region
AST region
type region
IR region
runtime region

Each region has a purpose and a lifetime.

╭──────────────╮
│ Source text  │ lives for parse/check
╰──────┬───────╯
       ▼
╭──────────────╮
│ AST arena    │ lives for semantic passes
╰──────┬───────╯
       ▼
╭──────────────╮
│ HIR arena    │ lives for type checking
╰──────┬───────╯
       ▼
╭──────────────╮
│ Runtime env  │ lives during execution
╰──────────────╯

When this picture feels natural, you understand arena allocation at a compiler-design level.

Part 41: mastery track

This section turns the guide into a learning path. Reading gives you the map. These exercises make the knowledge real.

The goal is not only to know what an arena is. The goal is this:

I can design the memory layout of a compiler phase, explain why I chose it, implement it in Rust, debug it, and improve it when the compiler grows.

That is the level where you can give a talk and also build with confidence.

How to use this track

Do not do everything in one day. Arena allocation looks simple, but the real skill is design judgement.

Use this order:

implement the tiny examples from memory
rewrite one small part of your Lox interpreter
add IDs and spans
add a symbol interner
add semantic passes
measure and explain the result
prepare a talk from your own code

If you can do those steps without copying the guide, you are no longer only reading. You are building the mental model.

Stage 1: explain the idea without code

Before touching Rust, you should be able to explain arenas in simple words.

Try to answer these out loud:

What problem does an arena solve?
Why does an AST often have one shared lifetime?
Why is freeing every node one by one often wasted work?
Why can bulk free be safe for compiler data?
Why are arenas not always the right choice?
What is the difference between a general heap allocator and an arena?
What does phase-based memory mean?

Good answer shape:

A compiler creates many small objects in phases. Many of those objects die together. An arena stores those objects in one owner and frees them together. That can reduce allocation overhead, make memory more compact, and make ownership easier to reason about.

Bad answer shape:

Arena is faster.

That answer is too small. You must explain why it is faster, when it is faster, and what tradeoff you accepted.

Stage 2: implement a tiny typed arena from memory

Write this without looking at the guide:

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
struct Id(usize);

struct Arena<T> {
    items: Vec<T>,
}

impl<T> Arena<T> {
    fn new() -> Self {
        Self { items: Vec::new() }
    }

    fn alloc(&mut self, value: T) -> Id {
        let id = Id(self.items.len());
        self.items.push(value);
        id
    }

    fn get(&self, id: Id) -> &T {
        &self.items[id.0]
    }

    fn get_mut(&mut self, id: Id) -> &mut T {
        &mut self.items[id.0]
    }
}

Then explain the hidden design choices:

Id is Copy because it is just a small handle.
The arena owns the actual values.
The caller never owns the node directly.
The ID is stable as long as we only push and never remove.
Dropping the arena drops all values.

Then answer this:

Why does Vec moving its internal buffer not break Id?

The answer: Id is an index, not a pointer. If Vec moves its buffer, index 3 is still index 3.

Stage 3: make IDs type-safe

The first Id type is too weak. It lets you accidentally use an expression ID inside a statement arena.

A stronger design uses separate ID types:

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
struct ExprId(usize);

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
struct StmtId(usize);

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
struct DeclId(usize);

Your exercise:

create ExprArena around Vec<ExprNode>
create StmtArena around Vec<StmtNode>
create DeclArena around Vec<DeclNode>
make each arena return only its own ID type

This feels repetitive. That is okay. In compiler code, type clarity often matters more than removing five lines of repetition.

Talk explanation:

I use separate ID types because an ID is not just a number. It means a specific kind of compiler object. The Rust type system can stop me from mixing them.

Stage 4: convert your Lox Expr first

Do not convert the whole interpreter at once. Convert Expr first.

Current style:

pub enum Expr {
    Grouping(Box<Expr>),
    BinaryOperation {
        left: Box<Expr>,
        op: BinaryOperator,
        right: Box<Expr>,
    },
}

Arena index style:

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
pub struct ExprId(usize);

pub enum ExprNode {
    Grouping(ExprId),
    BinaryOperation {
        left: ExprId,
        op: BinaryOperator,
        right: ExprId,
    },
}

The important mental shift:

ExprId is the handle.
ExprNode is the stored data.
Ast or Parser owns the arena.
Parser functions return ExprId instead of Expr.

So this:

fn expr(&mut self) -> Result<Expr, ParseError>

becomes this:

fn expr(&mut self) -> Result<ExprId, ParseError>

Your first migration milestone:

primary returns ExprId
unary returns ExprId
factor returns ExprId
term returns ExprId
comparison returns ExprId
equality returns ExprId
assignment returns ExprId

This is the exact place where arenas become real in your own code.

Stage 5: add a real Ast owner

A production-style interpreter should not make the parser own random global state. Make one owner for the parsed program data.

Example shape:

pub struct Ast {
    pub exprs: ExprArena,
    pub stmts: StmtArena,
    pub decls: DeclArena,
}

impl Ast {
    pub fn new() -> Self {
        Self {
            exprs: ExprArena::new(),
            stmts: StmtArena::new(),
            decls: DeclArena::new(),
        }
    }
}

Then your Program becomes:

pub struct Program {
    pub declarations: Vec<DeclId>,
}

And parse returns both:

pub struct ParsedProgram {
    pub ast: Ast,
    pub program: Program,
}

Now the ownership story is clear:

╭──────────────╮
│ ParsedProgram│
╰──────┬───────╯
       │ owns
       ▼
╭──────────────╮      ╭──────────────╮
│ Ast arenas   │◀─────│ Program IDs  │
╰──────┬───────╯      ╰──────────────╯
       │
       ▼
╭──────────────╮
│ Expr/Stmt/...│
╰──────────────╯

This is a talk-worthy design because the diagram tells the ownership story.

Stage 6: make the interpreter read through IDs

Your evaluator changes from owning cloned nodes to borrowing nodes from Ast.

Old shape:

fn eval_expr(&mut self, expr: Expr) -> Result<RuntimeValue, RuntimeError>

Better arena shape:

fn eval_expr(&mut self, ast: &Ast, expr: ExprId) -> Result<RuntimeValue, RuntimeError> {
    match ast.exprs.get(expr) {
        ExprNode::Number(value) => Ok(RuntimeValue::Number(*value)),
        ExprNode::Bool(value) => Ok(RuntimeValue::Bool(*value)),
        ExprNode::BinaryOperation { left, op, right } => {
            let left = self.eval_expr(ast, *left)?;
            let right = self.eval_expr(ast, *right)?;
            self.eval_binary(left, op.clone(), right)
        }
        _ => todo!(),
    }
}

The important result:

no AST clone for evaluation
recursive structure is represented by IDs
the arena owns all nodes
interpreter only reads the AST

This is closer to real compiler engineering.

Stage 7: add Span everywhere

A compiler without source locations is hard to debug.

Add this:

#[derive(Copy, Clone, Debug, PartialEq, Eq)]
pub struct Span {
    pub start: usize,
    pub end: usize,
    pub line: u32,
}

Then wrap nodes:

pub struct ExprData {
    pub kind: ExprNode,
    pub span: Span,
}

Your arena stores ExprData, not only ExprNode.

Why this matters:

parse errors point to source
runtime errors point to source
semantic errors point to source
later compiler passes can keep useful diagnostics

Talk sentence:

Arena IDs are not only for memory. They also become stable keys that let later compiler passes attach extra information to syntax nodes.

Stage 8: add side tables

Modern compilers often avoid putting every piece of data directly inside the AST node.

Instead, they use side tables.

Side table means: a separate storage indexed by the same ID.

Example:

struct TypeTable {
    expr_types: Vec<Option<TypeId>>,
}

If ExprId(10) exists, expr_types[10] stores the type for that expression.

Diagram:

ExprId(10)
   │
   ├── ast.exprs[10]        -> BinaryOperation
   ├── type_table[10]       -> Number
   ├── const_table[10]      -> maybe constant value
   ╰── diagnostics[10]      -> maybe warning

This is one of the most important production compiler ideas.

Why it is useful:

AST remains simple
each pass owns its own computed facts
you can throw away pass data without destroying AST
you can recompute one table later
IDs connect everything safely

Your exercise:

create Type enum with Number, String, Bool, Nil, Unknown
create Vec<Option<Type>> same length as expr arena
write a type_check_expr function
store the result in the side table
make binary + check number plus number or string plus string

Stage 9: add a symbol interner

Right now your lexer creates many String values.

For identifiers, compilers usually intern strings.

Interning means:

Store one copy of a string and use a small ID to refer to it.

Example:

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
pub struct Symbol(usize);

pub struct Interner {
    strings: Vec<String>,
}

Simple version:

impl Interner {
    pub fn intern(&mut self, name: &str) -> Symbol {
        for (index, existing) in self.strings.iter().enumerate() {
            if existing == name {
                return Symbol(index);
            }
        }

        let symbol = Symbol(self.strings.len());
        self.strings.push(name.to_string());
        symbol
    }

    pub fn get(&self, symbol: Symbol) -> &str {
        &self.strings[symbol.0]
    }
}

This simple version is not the fastest because it scans linearly. Later you can add HashMap<String, Symbol>. But first learn the model.

Your migration:

Identifier name changes from String to Symbol
TokenType::Identifier can store Symbol if lexing owns an interner
variable environments can use Symbol as key

Production idea:

Compiler data often becomes IDs pointing into arenas and interners. IDs are the compiler's internal language.

Stage 10: build a resolver pass

A resolver connects variable uses to variable declarations.

Your current interpreter looks up variables by name at runtime. That is okay for a small interpreter. But compilers often resolve names before execution.

Add this idea:

#[derive(Copy, Clone, Debug, PartialEq, Eq, Hash)]
pub struct BindingId(usize);

pub struct ResolverTables {
    pub expr_binding: Vec<Option<BindingId>>,
}

For each identifier expression:

find the declaration in scope
create or reuse a BindingId
store binding in expr_binding[expr_id]

Diagram:

source: print name;

ExprId(22) Identifier(name)
     │
     ▼
ResolverTables.expr_binding[22]
     │
     ▼
BindingId(3) -> var name declared at line 1

Why this matters for arena mastery:

the AST is stable
the resolver produces side data
the interpreter can use resolved bindings
you start thinking like a compiler engineer

Stage 11: implement compiler passes as separate structs

Do not put every pass inside Parser or Interpreter.

Use shapes like this:

pub struct Resolver<'a> {
    ast: &'a Ast,
    scopes: Vec<Scope>,
    tables: ResolverTables,
}

pub struct TypeChecker<'a> {
    ast: &'a Ast,
    resolver: &'a ResolverTables,
    types: TypeTable,
}

This makes phase ownership clear:

Parser ───────▶ Ast
                 │
                 ▼
Resolver ─────▶ ResolverTables
                 │
                 ▼
TypeChecker ───▶ TypeTable

Each pass reads earlier data and writes its own data.

That is a production compiler habit.

Stage 12: add tests that prove the design

Arena bugs are often design bugs, not only syntax bugs.

Write tests for behavior like:

allocating two expressions gives different IDs
retrieving an ID returns the right node
BinaryOperation stores child IDs correctly
parser returns a Program with declaration IDs
pretty-printer can walk the AST from root IDs
resolver maps identifier expressions to bindings
type checker writes type data for every expression

Example test:

#[test]
fn binary_expression_points_to_children() {
    let mut ast = Ast::new();

    let left = ast.exprs.alloc(ExprNode::Number(1.0));
    let right = ast.exprs.alloc(ExprNode::Number(2.0));
    let root = ast.exprs.alloc(ExprNode::BinaryOperation {
        left,
        op: BinaryOperator::Add,
        right,
    });

    match ast.exprs.get(root) {
        ExprNode::BinaryOperation { left: l, right: r, .. } => {
            assert_eq!(*l, left);
            assert_eq!(*r, right);
        }
        _ => panic!("expected binary expression"),
    }
}

If your tests can walk IDs through many passes, your design is becoming solid.

Stage 13: measure the result

Do not say arena is faster without measuring.

For your Lox interpreter, create a benchmark-like command manually with std::time::Instant.

You can measure:

parse time
number of Expr nodes
number of Stmt nodes
number of Decl nodes
max scope depth
total source size
time to interpret

Simple measurement:

let start = std::time::Instant::now();
let parsed = parser.parse().unwrap();
let parse_time = start.elapsed();

println!("parse time: {:?}", parse_time);
println!("expr nodes: {}", parsed.ast.exprs.len());

Add len to each arena:

fn len(&self) -> usize {
    self.items.len()
}

Talk rule:

I do not claim speed only because I used an arena. I claim a better allocation model, then I measure whether it improves this program.

That sentence sounds like a serious engineer.

Stage 14: prepare a stage talk outline

If you want to give a talk, use this structure.

Title:

Arena allocation in Rust compilers: from Box trees to ID-based ASTs

Talk flow:

show the beginner Box AST
explain the problem with many small allocations
explain compiler phase lifetimes
introduce arena as bulk-owned memory
show typed arena with ExprId
show parser returning ExprId
show interpreter reading through IDs
show side tables for types and resolver data
explain tradeoffs and failure modes
show the final Lox architecture

Simple stage diagram:

Before:

Expr owns Box<Expr>
Stmt owns Expr
Interpreter clones/walks owned nodes

After:

Ast owns arenas
Program stores root IDs
Passes read IDs and write side tables

The best talk is not only theory. Use your own Lox interpreter as the running example. That makes it real.

Stage 15: final project

Final project name:

Lox with arena-backed AST and compiler-style side tables

Build these features:

ExprId arena
StmtId arena
DeclId arena
Ast owner
Program root IDs
Span on every AST node
Symbol interner for identifiers
resolver pass
type or value classification table
pretty printer using IDs
interpreter that reads AST by IDs
error messages that use Span
simple measurements for parse time and node counts

When you finish this, you can honestly say:

I implemented a compiler-style arena architecture in Rust. I understand typed IDs, AST ownership, side tables, and phase-based memory.

That is a strong claim. It is not fake confidence. It is based on working code.

Production-readiness checklist

Before calling your design production-ready, ask these questions:

Can every AST node be reached from a Program root or pass root?
Are IDs type-safe, or can ExprId be mixed with StmtId?
Do error messages still have source spans?
Can later passes attach data without mutating every AST node?
Is there a clear owner for AST, symbols, scopes, types, and IR?
Can you drop temporary pass data when no longer needed?
Are there tests for parser, resolver, type data, and interpreter walking IDs?
Can you print/debug the AST by ID?
Can you measure node counts and parse time?
Do you know which arenas live for the whole program and which are scratch arenas?
Can you explain why an index arena was chosen instead of references?
Can you explain why an unsafe bump arena was not needed yet?

If you cannot answer one of these, that is not failure. It is the next thing to learn.

Self-interview questions

Use these to test whether you really understand the topic.

Beginner questions:

What is an arena allocator?
Why does an AST fit arena allocation well?
What does bulk free mean?
Why can many Box allocations be slower than arena allocation?

Intermediate questions:

Why are IDs often easier than references in Rust compiler data structures?
What is the difference between typed arena and index arena?
Why should Parser return ExprId instead of Expr?
Why should Program store DeclId roots?
What is a side table?

Advanced questions:

How would you add spans without making every pass messy?
How would you resolve variables using side tables?
How would you store inferred types for expressions?
When would you choose generational IDs?
When would a scratch arena be useful?
What can go wrong if you delete from an index arena?
Why is unsafe bump allocation harder to expose safely?

Expert questions:

What are the lifetime boundaries of each compiler phase?
Which data should live for the whole compilation?
Which data should die after one pass?
Which IDs are stable across passes?
Which IDs should never escape a pass?
How would this design change for incremental compilation?
How would this design change for multi-file compilation?
How would this design change if you add bytecode or IR?

If you can answer these with examples from your own interpreter, you can explain arena allocation on stage.

Implementation milestones for your current Lox code

Use this exact order. Do not skip ahead.

Milestone 1:

add ExprId
add ExprArena
add Ast with exprs only
parser expression methods return ExprId
interpreter can evaluate expression IDs

Milestone 2:

add StmtId
add StmtArena
statement parser methods return StmtId
interpreter evaluates StmtId

Milestone 3:

add DeclId
add DeclArena
Program stores Vec<DeclId>
interpreter evaluates DeclId

Milestone 4:

add Span
store Span on tokens
store Span on AST nodes
show line information in errors

Milestone 5:

add Symbol
add Interner
convert Identifier from String to Symbol
environment uses Symbol keys

Milestone 6:

add ResolverTables
resolve variables before interpretation
report undefined variable before runtime when possible

Milestone 7:

add TypeTable or ValueKindTable
check binary operations before runtime when possible
keep RuntimeError for true runtime-only errors

Milestone 8:

add pretty-printer
add node count stats
add parse time stats
compare old design and new design on a larger Lox file

After milestone 8, your interpreter will not just use an arena. It will have the shape of a real compiler front end.

What confidence should feel like

Real confidence is not this:

I read one article, so I know production compilers.

Real confidence is this:

I built the core design. I know why each owner exists. I know what IDs point to. I know where side data lives. I know what gets dropped and when. I can debug it. I can explain the tradeoffs.

That is the target.

Final checklist

You understand arena allocation when you can answer these:

What owns the nodes? The arena or Ast.
What points to nodes? IDs or arena references.
When are nodes freed? When the arena drops.
Why is it fast? Fewer allocator calls, compact data, bulk free.
Why is it good for compilers? Compiler data is phase-based.
What is the safest Rust version? Index arena.
What is the lower-level fast version? Bump arena.
What should your Lox interpreter use first? Index arena for AST nodes.

If you implement the index arena version for your Lox AST, you will already understand the most useful arena pattern for real interpreters and compilers.