The Magic Of ARM w/ Casey Muratori
Based on The PrimeTime's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Instruction-set semantics alone don’t justify “ARM is intrinsically lower power”; ABI and compiler choices can make assembly look similar across ARM and x86/x64 for simple code.
Briefing
ARM and x86/x64 aren’t fundamentally different “instruction sets for power” so much as they differ in how their machine code is encoded and decoded—and that decoding difference can matter for chip design. The most concrete hardware-level contrast comes from instruction encoding: x86/x64 uses variable-length instructions, while ARM (especially ARM64) uses fixed-size instruction encodings (with some ARM variants like Thumb introducing 16-bit forms). Variable-length encoding forces CPUs to do more complex decoding work to figure out where the next instruction begins, which complicates scaling to wider, multi-instruction-per-cycle decoders.
To make the comparison tangible, the discussion uses Compiler Explorer (godbolt.org), a tool that compiles the same C code and shows the resulting assembly and even the exact machine-code bytes. With a trivial function that just returns a value, the assembly looks nearly identical across architectures—differences mostly come from ABI conventions for where parameters and return values live. For example, on x86-64 System V ABI (as assumed by the setup), the first input parameter is placed in EDI and the return value comes back in EAX, so returning the first parameter may require a move. On ARM64, the first parameter and the return value can share the same register (R0/W0), so the compiler can avoid that move in simple cases; when returning a different parameter, ARM must move values into the return register too. The takeaway: the “ARM is better for low power” claim can’t be justified by instruction semantics alone, because the ISA-level differences don’t obviously show up in these tiny examples.
The hardware-level difference appears when looking at the actual bytes. x86/x64 instructions can be compact (e.g., a one-byte return) or longer (e.g., multi-byte encodings for other operations), so instruction boundaries aren’t known until decoding. ARM’s encodings are more regular: instructions are typically a consistent size, making it easier for the CPU to decode multiple instructions in parallel by fixed offsets. The practical implication is that x86 decoders often rely on “guessing” instruction boundaries and then falling back when guesses fail, while ARM can feed a straightforward fixed-offset decoding pipeline. That extra decoding complexity translates into real engineering cost—more circuitry, more power, and more difficulty sustaining high decode throughput.
Still, the conversation pushes back on attributing ARM’s historical low-power advantage purely to ISA design. Modern ARM chips (including Snapdragon and Apple M-series) execute complex instruction sets via micro-ops, similar to x86; the idea that ARM is “reduced” in the meaningful sense doesn’t hold up. Instead, the stronger explanation is market and ecosystem inertia: ARM grew up in low-power niches, with many licensees optimizing for battery life and thermal limits, while x86/x64 grew up in higher-power performance markets. Even when x86/x64 later pivoted toward efficiency, hardware design can’t change overnight. The openness of ARM licensing is also framed as a structural advantage—more designers can build ARM-based low-power systems, increasing the volume of innovation in that direction.
In the end, the most defensible “why ARM can be lower power” story is narrower: fixed-length decoding can simplify and reduce the cost of scaling instruction decode, which may help at the margins. But the biggest historical gap is portrayed as business dynamics and design lineage rather than a single magic ISA feature.
Cornell Notes
ARM’s advantage in low power is often overstated as an ISA-level truth, but the discussion identifies a real hardware-relevant difference: ARM’s instruction encodings are more regular (often fixed-size), while x86/x64 uses variable-length instructions. Variable-length encoding forces CPUs to decode more intelligently to find instruction boundaries, which complicates scaling to wide, multi-instruction-per-cycle decoders. ABI conventions explain why simple assembly listings can look similar across ARM and x86/x64 even when the machine-code bytes differ. Historically, ARM’s low-power dominance is attributed less to ISA “magic” and more to market dynamics: ARM’s ecosystem evolved around low-power design, while x86/x64 evolved around performance and only later shifted toward efficiency.
Why do ARM and x86/x64 assembly listings look almost the same in simple examples, even though the underlying machine code differs a lot?
What is the biggest hardware-level ISA difference highlighted between x86/x64 and ARM?
How does variable-length decoding affect CPU performance and power?
Does the discussion claim ARM is always more power efficient because of the ISA?
Why do “hundreds of instructions” exist in modern ARM, and does that contradict the idea of ARM being “reduced”?
What role does Compiler Explorer (godbolt.org) play in the comparison?
Review Questions
- What specific mechanism makes x86/x64 decoding harder than ARM according to the discussion, and how does that relate to multi-instruction-per-cycle performance?
- How do ABI conventions explain differences in whether a compiler needs to move values into the return register on ARM versus x86-64?
- Why does the discussion argue that ARM’s historical low-power advantage can’t be fully explained by ISA differences alone?
Key Points
- 1
Instruction-set semantics alone don’t justify “ARM is intrinsically lower power”; ABI and compiler choices can make assembly look similar across ARM and x86/x64 for simple code.
- 2
The most direct hardware-relevant ISA contrast is instruction encoding regularity: ARM’s fixed-size encodings make instruction boundaries predictable, while x86/x64 variable-length encodings require more complex decoding.
- 3
Variable-length decoding complicates scaling to wide decoders because the CPU must determine where each instruction ends and the next begins, often involving speculative boundary handling.
- 4
ARM’s historical low-power dominance is attributed more to market and ecosystem dynamics—ARM’s licensing and early focus on low-power designs—than to a single ISA feature.
- 5
Modern ARM and x86/x64 both execute complex ISAs via micro-ops, so “reduced instruction set” is not a reliable explanation for power differences today.
- 6
Binary size and instruction-cache behavior can still matter: larger, variable-length encodings can contribute to instruction-cache pressure and more frequent instruction-cache misses.
- 7
Having many instructions is largely a byproduct of performance targets (e.g., SIMD/vector) and system/security requirements, with hardware cost managed through micro-op translation rather than dedicated circuitry per instruction.