Why is the Rust Compiler So SLOW?

TL;DR

Naïve Docker builds for Rust often rebuild the entire dependency graph on every change; separating dependency compilation from application compilation is essential for fast iteration.

Briefing Cornell Notes

Briefing

Rust compiler slowness in this case traces less to “lifetimes” and more to release-time optimization work—especially LLVM’s LTO and inlining—magnified by a dependency-heavy, async-heavy codebase. The practical pain shows up when containerizing a Rust web server: a naïve Dockerfile rebuilds the entire dependency graph every change, turning iteration into a multi-minute cycle.

The workflow starts with a common deployment pattern: compile a single static Rust binary, copy it into a Linux container, and restart. That approach works, but it’s slow under Docker because any code change invalidates the build context, forcing a full rebuild. In the author’s measurements, a clean release build took about four minutes, with roughly ten seconds of network time and the rest spent compiling and optimizing.

To fix the iteration loop, the discussion pivots to Docker build caching for Rust: using cargo chef to pre-build dependencies as a separate Docker layer. cargo chef generates a “recipe” for the workspace, caches dependency compilation, and ensures code edits only trigger recompilation of the application crate rather than hundreds of dependencies.

Once caching makes rebuilds tolerable, the next question becomes: why does the final release binary still take so long? Profiling with cargo timings and Rust’s self-profiling (via rusty with rustc’s self-profile flags) points to the final crate’s optimization pipeline. The biggest chunk is LLVM codegen and, within that, LTO-related optimization. Flame graphs and “icicle/stelactite” style views don’t reveal much visually, but LTO and LLVM module optimization dominate the wall time.

A deeper dive into LLVM pass timing shows that optimization passes like inlining and opt-function work are the heavy hitters. The analysis then turns up a surprising culprit: expensive compilation is tied to closures generated by async functions. Rust’s async lowering produces state machines that are represented internally with nested closures; when the profiler symbols are improved (switching mangling to v0ero), the trace becomes readable enough to identify which async-related closures and generic instantiations consume the most optimization time.

Armed with that, the author experiments with restructuring async code—splitting or refactoring large async blocks, changing how futures are pinned/boxed, and adjusting inlining behavior. One targeted change to a photo-processing async path drops an individual function’s optimization time dramatically (from multi-second to about ~2 seconds), though rerunning the full build shows only modest overall gains at first.

The broader optimization strategy becomes a mix of compiler knobs and build architecture: reducing inlining thresholds, lowering optimization for the final binary, and—most dramatically—changing dependency and build environment choices. Community follow-ups add further levers: enabling shared generics (with trade-offs), switching Docker base images away from Alpine (allocator and toolchain differences can swing compile times), and tuning LTO/debug settings.

The takeaway is not a single magic flag. It’s a chain: container rebuild strategy, then LLVM’s LTO/inlining costs, then Rust’s async-generated closure complexity and generic instantiation. The result is a roadmap for making Rust builds faster without abandoning Rust’s safety model—while also highlighting that the “cost” of deep optimization can dwarf dependency compilation in real projects.

Cornell Notes

Rust build slowness here is pinned on release-time optimization work—especially LLVM LTO and inlining—rather than on lifetimes. Naïve Docker builds rebuild everything on each change, but cargo chef can cache dependency compilation so edits only rebuild the application crate. Profiling with cargo timings and Rust self-profiling shows LLVM module optimization and LTO passes dominate the final binary’s compile time. After improving symbol mangling (v0ero), the profiler reveals that async functions generate nested closures and state-machine machinery that become expensive to optimize. Refactoring async-heavy code and tuning inlining/LTO settings can reduce optimization time, and environment choices (e.g., switching off Alpine) can further cut build times.

Why does containerizing a Rust app often make builds feel dramatically slower?

A straightforward Dockerfile that runs cargo build inside the image tends to invalidate the dependency layer whenever application code changes. That forces a full rebuild of the dependency graph, even if only a few lines changed. In the author’s measurements, clean release builds took about four minutes, with only ~10 seconds attributed to network time—most time was compilation/optimization. The fix is to separate dependency compilation from application code compilation using cargo chef, so dependency layers stay cached across edits.

What profiling signals point to LLVM/LTO/inlining as the main bottleneck?

cargo timings and Rust self-profiling show that the final crate’s optimization dominates. Flame-graph-style views don’t add much visual clarity, but pass-level timing and LTO-focused analysis indicate that codegen and LTO-related optimization consume the bulk of time. In particular, LLVM module optimization and passes related to inlining and opt-function are repeatedly identified as the largest contributors.

What unexpected code pattern shows up as a major compilation cost?

Async-heavy code. Rust’s async lowering turns async blocks into state machines represented internally with nested closures. When symbol mangling is improved (mangling version v0ero), the profiler can name these async-generated closures and show they consume substantial optimization time—sometimes via generic instantiations and destructor-related “drop glue” work.

How does improving symbol mangling change what the profiler can tell you?

With the default (legacy) mangling, profiler output can label items vaguely (e.g., closure identifiers without enough context). Switching to mangling version v0ero produces more informative symbols that include clearer closure structure and generics. That makes it possible to identify which async closures and specific functions (e.g., setup paths inside a larger async flow) are responsible for most optimization time.

What kinds of code changes are tried once async closures are identified?

Refactoring large async functions and breaking them into smaller pieces, while being careful not to accidentally change the number of await points. Experiments include restructuring async blocks, adjusting how futures are pinned/boxed, and tuning inlining behavior. One targeted refactor of an async photo-processing path reduced that function’s optimization time significantly (multi-second down to ~2 seconds), though full-build improvements depend on how much of the total time those functions represent.

What build-environment and dependency-level changes can also swing compile times?

Community follow-ups highlight that shared generics can reduce repeated generic instantiation work (though it may affect codegen). Switching Docker base images away from Alpine can also help because allocator and toolchain differences affect compilation. One reported example: moving from Alpine to Debian and removing an x86_64 musl/unknown Linux muscle setup reduced total compilation time dramatically (reported as 29 to 9 seconds in that anecdote).

Review Questions

When using Docker for Rust, what caching strategy prevents dependency recompilation on every code change, and why does it matter for iteration speed?
Which profiling outputs (cargo timings vs self-profiling vs LLVM pass timing) are most useful for identifying whether the bottleneck is dependencies, LTO, inlining, or async-generated closures?
Why does symbol mangling (v0ero) make it easier to act on profiling data, and what kinds of functions become identifiable once symbols are clearer?

Key Points

1
Naïve Docker builds for Rust often rebuild the entire dependency graph on every change; separating dependency compilation from application compilation is essential for fast iteration.
2
cargo chef is a practical way to cache Rust dependencies as a Docker layer so code edits only trigger recompilation of the workspace crate(s), not hundreds of dependencies.
3
Release build slowness can be dominated by LLVM optimization work—especially LTO and inlining—rather than by dependency compilation alone.
4
Rust async lowering can generate nested closures/state-machine machinery that becomes expensive for LLVM to optimize; profiling with better symbols can reveal this directly.
5
Switching mangling to v0ero can turn opaque closure names into actionable, context-rich symbols that map optimization time back to specific async code paths.
6
Tuning compiler knobs (LTO/debug settings, inlining thresholds/opt levels) can reduce optimization time, but the biggest gains often come from identifying and refactoring the specific expensive constructs.
7
Build environment choices (e.g., Alpine vs Debian base images and allocator/toolchain differences) can materially change compile times, sometimes more than small code tweaks.

Highlights

The biggest release-time cost isn’t just “more dependencies”—it’s LLVM’s LTO/inlining optimization pipeline dominating the final binary build.

Async functions can be a compilation hotspot because Rust represents async state machines using nested closures that LLVM must optimize.

Switching symbol mangling to v0ero can transform unusable profiler output into clear, actionable function/closure names.

cargo chef turns Docker rebuilds from “minutes every edit” into “only rebuild what changed,” making deeper profiling feasible.

Allocator/toolchain differences from changing Docker base images (like Alpine to Debian) can swing compile times dramatically.

Topics

Rust Compiler Performance
Docker Build Caching
cargo chef
LLVM LTO
Async Closures

Mentioned

Peter Oswix
Wesley Moore
Brendan Greg
Luca
Rob
Zach
Ryan Winchester
LTO
LLVM
LVM
CPU
LVMIR
IR
TID