Why is the Rust Compiler So SLOW?
Based on The PrimeTime's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Naïve Docker builds for Rust often rebuild the entire dependency graph on every change; separating dependency compilation from application compilation is essential for fast iteration.
Briefing
Rust compiler slowness in this case traces less to “lifetimes” and more to release-time optimization work—especially LLVM’s LTO and inlining—magnified by a dependency-heavy, async-heavy codebase. The practical pain shows up when containerizing a Rust web server: a naïve Dockerfile rebuilds the entire dependency graph every change, turning iteration into a multi-minute cycle.
The workflow starts with a common deployment pattern: compile a single static Rust binary, copy it into a Linux container, and restart. That approach works, but it’s slow under Docker because any code change invalidates the build context, forcing a full rebuild. In the author’s measurements, a clean release build took about four minutes, with roughly ten seconds of network time and the rest spent compiling and optimizing.
To fix the iteration loop, the discussion pivots to Docker build caching for Rust: using cargo chef to pre-build dependencies as a separate Docker layer. cargo chef generates a “recipe” for the workspace, caches dependency compilation, and ensures code edits only trigger recompilation of the application crate rather than hundreds of dependencies.
Once caching makes rebuilds tolerable, the next question becomes: why does the final release binary still take so long? Profiling with cargo timings and Rust’s self-profiling (via rusty with rustc’s self-profile flags) points to the final crate’s optimization pipeline. The biggest chunk is LLVM codegen and, within that, LTO-related optimization. Flame graphs and “icicle/stelactite” style views don’t reveal much visually, but LTO and LLVM module optimization dominate the wall time.
A deeper dive into LLVM pass timing shows that optimization passes like inlining and opt-function work are the heavy hitters. The analysis then turns up a surprising culprit: expensive compilation is tied to closures generated by async functions. Rust’s async lowering produces state machines that are represented internally with nested closures; when the profiler symbols are improved (switching mangling to v0ero), the trace becomes readable enough to identify which async-related closures and generic instantiations consume the most optimization time.
Armed with that, the author experiments with restructuring async code—splitting or refactoring large async blocks, changing how futures are pinned/boxed, and adjusting inlining behavior. One targeted change to a photo-processing async path drops an individual function’s optimization time dramatically (from multi-second to about ~2 seconds), though rerunning the full build shows only modest overall gains at first.
The broader optimization strategy becomes a mix of compiler knobs and build architecture: reducing inlining thresholds, lowering optimization for the final binary, and—most dramatically—changing dependency and build environment choices. Community follow-ups add further levers: enabling shared generics (with trade-offs), switching Docker base images away from Alpine (allocator and toolchain differences can swing compile times), and tuning LTO/debug settings.
The takeaway is not a single magic flag. It’s a chain: container rebuild strategy, then LLVM’s LTO/inlining costs, then Rust’s async-generated closure complexity and generic instantiation. The result is a roadmap for making Rust builds faster without abandoning Rust’s safety model—while also highlighting that the “cost” of deep optimization can dwarf dependency compilation in real projects.
Cornell Notes
Rust build slowness here is pinned on release-time optimization work—especially LLVM LTO and inlining—rather than on lifetimes. Naïve Docker builds rebuild everything on each change, but cargo chef can cache dependency compilation so edits only rebuild the application crate. Profiling with cargo timings and Rust self-profiling shows LLVM module optimization and LTO passes dominate the final binary’s compile time. After improving symbol mangling (v0ero), the profiler reveals that async functions generate nested closures and state-machine machinery that become expensive to optimize. Refactoring async-heavy code and tuning inlining/LTO settings can reduce optimization time, and environment choices (e.g., switching off Alpine) can further cut build times.
Why does containerizing a Rust app often make builds feel dramatically slower?
What profiling signals point to LLVM/LTO/inlining as the main bottleneck?
What unexpected code pattern shows up as a major compilation cost?
How does improving symbol mangling change what the profiler can tell you?
What kinds of code changes are tried once async closures are identified?
What build-environment and dependency-level changes can also swing compile times?
Review Questions
- When using Docker for Rust, what caching strategy prevents dependency recompilation on every code change, and why does it matter for iteration speed?
- Which profiling outputs (cargo timings vs self-profiling vs LLVM pass timing) are most useful for identifying whether the bottleneck is dependencies, LTO, inlining, or async-generated closures?
- Why does symbol mangling (v0ero) make it easier to act on profiling data, and what kinds of functions become identifiable once symbols are clearer?
Key Points
- 1
Naïve Docker builds for Rust often rebuild the entire dependency graph on every change; separating dependency compilation from application compilation is essential for fast iteration.
- 2
cargo chef is a practical way to cache Rust dependencies as a Docker layer so code edits only trigger recompilation of the workspace crate(s), not hundreds of dependencies.
- 3
Release build slowness can be dominated by LLVM optimization work—especially LTO and inlining—rather than by dependency compilation alone.
- 4
Rust async lowering can generate nested closures/state-machine machinery that becomes expensive for LLVM to optimize; profiling with better symbols can reveal this directly.
- 5
Switching mangling to v0ero can turn opaque closure names into actionable, context-rich symbols that map optimization time back to specific async code paths.
- 6
Tuning compiler knobs (LTO/debug settings, inlining thresholds/opt levels) can reduce optimization time, but the biggest gains often come from identifying and refactoring the specific expensive constructs.
- 7
Build environment choices (e.g., Alpine vs Debian base images and allocator/toolchain differences) can materially change compile times, sometimes more than small code tweaks.