A fully-featured compiler for the C0 language, a safe subset of C used for teaching at CMU. The compiler targets both x86_64 directly and LLVM IR (enabling compilation to ARM and other architectures). With all optimizations enabled, the compiler achieves an average 1.74x speedup on benchmark test cases.
Pipeline Overview
The compiler supports two backends: a native x86_64 backend and an LLVM IR backend for cross-platform compilation.
flowchart LR
subgraph FE[Frontend]
A[C0 Source] --> B[Lexer]
B --> C[Parser]
C --> D[Elaborator]
D --> E[Type Checker]
end
subgraph ME[Middle-end]
F[IR Translation] --> G{Target?}
G -->|x86-64| H[IR Opts]
H --> I[Abstract Assembly]
G -->|LLVM| J[LLVM IR]
end
subgraph BE[x86 Backend]
K[Asm Opts] --> L[Reg Alloc]
L --> M[x86-64]
end
subgraph LLVM[LLVM Backend]
N[llc] --> O[x86_64 / ARM / etc.]
end
FE --> ME
I --> BE
J --> LLVM
Implementation
Frontend
- Lexer with lexer hack for distinguishing type identifiers from regular identifiers
- Parser built using Menhir
- Elaboration pass for semantic analysis and AST transformation
Middle-end
- IR translation to a lower-level intermediate representation
- SSA construction for enabling dataflow optimizations
Optimizations
The compiler implements optimizations at two layers: the intermediate representation (IR) and abstract assembly.
flowchart LR
subgraph IR[IR Optimizations]
A[Function Inlining] --> B[Tail Call Opt]
B --> C[Reachability]
end
subgraph SSA[SSA Optimizations]
D[CCP] --> E[ADCE]
E --> F[Copy Prop]
F -.->|repeat| D
end
subgraph Loop[Loop Optimizations]
G[Loop-nest Tree] --> H[Invariant Hoisting]
H --> I[Alignment]
end
subgraph x86[x86 Peephole]
J[Redundant Moves]
K[Unnecessary LEA]
L[Unnecessary JMP]
end
IR --> SSA --> Loop --> x86
IR Optimizations
Tail Call Optimization
For functions that end in recursive tail calls (not mutual recursion), we remove the tail call and instead move the arguments into the correct argument temps, then jump to a point at the beginning of the function body. This eliminates function calling stack overhead and makes control-flow analysis easier. This optimization was critical for preventing stack overflows in deeply recursive programs.
Function Inlining
We perform function inlining for non-recursive calls up to a call depth of 3, only inlining functions with at most 100 lines in the body. For example, if function f calls g, g calls h, and h calls i, we inline g into f, h into g, and i into h, but we don't inline beyond that depth.
Inlining was particularly effective when combined with constant propagation, as it allows constants to propagate across function boundaries. This effectively removed many small arithmetic helper functions entirely.
Reachability Analysis
We transform the IR into basic blocks and prune any blocks that are unreachable from the function entry point.
Abstract Assembly Optimizations
Strength Reductions
Standard strength reduction pass replacing expensive operations (like multiplication by powers of two) with cheaper equivalents (like left shifts).
Conditional Constant Propagation (CCP)
Adapted from the Appel textbook, this algorithm performs constant propagation while simultaneously pruning branches that would not be taken based on constant values. Combined with function inlining, CCP can propagate constants across function boundaries, enabling significant simplifications.
Aggressive Dead Code Elimination (ADCE)
Adapted from the Cooper textbook, this removes branches and instructions within the control flow graph that are not necessary for effectful operations such as returns, memory accesses, or function calls.
Loop Invariant Hoisting
Using a Loop-nest tree construction, we identify instructions within loops that can be hoisted to loop preheaders. The algorithm works recursively from the deepest nested loops outward, using hoisted invariants from inner loops when computing invariants for parent loops.
Register Coalescing
Using a union-find data structure, we coalesce moves between temps that are not live at the same time. This is a fast single-pass optimization that runs at all optimization levels.
x86 Optimizations
We eliminate self-move operations (moves of the form %rax <- %rax). This optimization became especially valuable after register coalescing, since coalesced temps are assigned the same register, making their original move instructions redundant.
Backend
- Instruction selection for
x86_64 - Register allocation via chordal graph coloring
- Peephole optimizations (redundant moves, unnecessary
leaandjmpremoval)
LLVM Backend
The compiler also supports emitting LLVM IR, which can then be compiled via llc to target multiple architectures including ARM.
Safety Checks
To match C0's dynamic semantics in safe mode, we inject control flow checks for:
- Division/modulus by zero
- Division of
INT_MINby-1 - Shift amounts outside the range
[0, 32)
These checks raise SIGFPE when an illegal operation is attempted.
SSA Adaptations
LLVM IR requires phi nodes to contain values for every incoming edge to a basic block. We adapted our SSA representation by adding dummy values (0 for integers, null for pointers) on edges where a temp has not yet been defined.
Performance
When targeting ARM via LLVM on Apple Silicon, native ARM code showed significant speedups over x86-64 code running through Rosetta translation.