C Must Die

TL;DR

C’s portability depends on staying within defined behavior; undefined behavior lets compilers change semantics across builds and targets.

Briefing Cornell Notes

Briefing

C’s rise is inseparable from Unix’s early need to move across hardware, but its modern “portability” bargain comes with a darker catch: undefined behavior gives compilers permission to break code in ways that vary by optimization level, compiler version, architecture, and even the order of internal passes. That combination—portable syntax paired with non-portable semantics—turns C into a language where “it works on my machine” can become a security risk, not just a debugging headache.

The story begins at Bell Labs, where Ken Thompson and Dennis Ritchie built early Unix tooling for PDP systems using assembly because writing machine code was too labor-intensive. C emerged as a replacement that let Unix kernels and utilities be rewritten with low-level control while avoiding per-architecture rewrites. Porting Unix to new hardware then became easier, since C source could be recompiled rather than manually reassembled for each instruction set.

But as Unix spread, performance and correctness problems appeared. C programs often ran slower than expected when the target hardware diverged from the original PDP environment. Compiler developers responded with increasingly aggressive optimizations, which gradually moved C further from “transparent” low-level behavior. The language standardization effort—culminating in the 1989 C standard (often referred to as C89)—tried to formalize portability using an “abstract machine.” That abstract machine made it possible to write programs that behave consistently across platforms, while still letting compilers optimize.

The mechanism that enables both portability and optimization is undefined behavior. When a program uses constructs the standard doesn’t define—such as shifting by too many bits, dereferencing invalid pointers, relying on signed overflow, or violating strict aliasing rules—the standard imposes no requirements. Compilers can treat such code as impossible, ignore it, reorder logic, or generate different results across builds. A concrete example shows a left shift that seems like it should yield a predictable value; instead, the outcome depends on how the compiler reasons about the undefined case. Another example demonstrates null-pointer checks being optimized away because the compiler can assume the “impossible” dereference never occurs, changing control flow.

The transcript then broadens the consequences: undefined behavior can linger silently until a change in compiler, flags, or architecture triggers a failure. It can also undermine security practices. Clearing sensitive data with memset may be removed if the compiler decides the memory is no longer used, leaving passwords or secrets in registers or stack memory. Strict aliasing further complicates low-level programming by allowing compilers to assume differently typed pointers don’t refer to the same memory—an assumption that breaks common “type punning” tricks.

A major flashpoint is signed integer overflow. In the C standard, signed overflow is undefined behavior, so compilers may delete overflow checks entirely. The transcript recounts a long-running GCC controversy where a developer’s overflow assertion disappeared under certain conditions, prompting debate over whether compilers should prioritize standards-based optimization or preserve safety checks for existing code. The broader conclusion is blunt: C’s combination of low-level power and undefined behavior makes it unreliable as a general development tool, pushing programmers toward languages like Rust or Zig that aim to reduce or eliminate these “time-bomb” failure modes.

Cornell Notes

C’s portability promise is undercut by undefined behavior: when code hits cases the C standard doesn’t define, compilers are free to optimize in ways that can change results across architectures, compiler versions, and optimization flags. Standardization (C89) introduced an abstract machine to define “normal” behavior, but it also created room for compilers to treat undefined constructs as impossible. The transcript highlights how undefined behavior can be exploited accidentally—shifts past bit-width, null-pointer dereferences, dead-code elimination around checks, strict aliasing violations, and signed overflow. The practical impact is security and reliability failures, including optimizations that remove attempts to clear secrets from memory. The takeaway: writing portable, correct C requires avoiding undefined behavior entirely, because the compiler may turn latent bugs into unpredictable outcomes at the worst possible time.

Why did C become central to Unix, and what problem did it solve compared with assembly?

C was developed as a replacement for assembly because writing machine code was too labor-intensive. When Unix needed to run on new hardware, C made porting easier: instead of rewriting the OS and utilities in assembly for each architecture, developers could recompile C source. The transcript notes that if assembly had been used, the operating system would have required rewriting for each computer architecture.

What is the “abstract machine” introduced in the C standard trying to achieve?

The C89 standard aimed to break the tight coupling between C and the original PDP-11 architecture by defining behavior in terms of an imaginary executor (the abstract machine). The goal was twofold: (1) make it possible to write portable C programs, and (2) still allow compilers freedom to optimize, as long as the program stays within the standard’s defined behavior.

How does undefined behavior turn into real-world unpredictability?

Undefined behavior is left unspecified by the standard, so compilers can handle it in any way—ignoring it, producing unpredictable results, or even terminating compilation/execution. The transcript gives examples where a shift by an amount that’s too large becomes undefined, and the observed value differs depending on compilation target. It also describes how compilers can remove null checks when an earlier (undefined) dereference makes the check logically unnecessary under optimization.

Why can memset-based “secret wiping” fail even when the code looks correct?

Because optimizations may treat the clearing call as unnecessary if the cleared memory is never used afterward. The transcript describes a password-handling scenario where a compiler can remove the memset call, and in worse cases keep the password in registers—making it easier for memory disclosure vulnerabilities to steal it. The key point is that the compiler optimizes based on observable program behavior, not on programmer intent to erase secrets.

What is strict aliasing, and why does it matter for low-level code?

Strict aliasing restricts how an object’s value may be accessed through different pointer types. The transcript explains that accessing memory through an incompatible type can be undefined behavior, letting compilers assume pointers of different types don’t alias. This breaks common “type punning” patterns (e.g., reading bytes of one type as another). Workarounds mentioned include using character types (which are exempt) or unions, though unions add complexity and can still be tricky.

Why did GCC remove an overflow check, and what does that reveal about signed overflow in C?

The transcript recounts a GCC case where an overflow assertion disappeared under certain conditions, even without special flags. The underlying reason is that signed overflow is undefined behavior in C, so the compiler may assume overflow cannot happen and optimize away checks. The debate centers on whether compilers should prioritize standards-based optimization or preserve safety checks for existing code.

Review Questions

Which categories of operations in C are treated as undefined behavior in the transcript, and how do compilers typically exploit that freedom?
Explain how optimization can remove a null-pointer check in the transcript’s example—what assumption makes the check “dead”?
What security failure can occur when clearing memory with memset, and why might the compiler decide the clearing is unnecessary?

Key Points

1
C’s portability depends on staying within defined behavior; undefined behavior lets compilers change semantics across builds and targets.
2
C89’s abstract machine formalized portability, but undefined behavior was intentionally left unspecified to enable optimization.
3
Shifts that exceed the bit-width, null-pointer dereferences, and other invalid constructs can produce different results depending on compilation target and optimization.
4
Compilers can reorder or eliminate logic around undefined behavior, such as removing null checks when an earlier dereference makes the check redundant under optimization.
5
Attempts to wipe secrets with memset can be optimized away if the cleared memory is not observed later, leaving passwords in stack or registers.
6
Strict aliasing lets compilers assume differently typed pointers don’t refer to the same memory, breaking common low-level type-punning tricks.
7
Signed integer overflow is undefined behavior in C, enabling compilers to remove overflow checks—driving long-running GCC debates about safety vs optimization.

Highlights

Undefined behavior is a “time bomb”: it may sit unnoticed until a compiler update, architecture change, or optimization flag triggers a failure.

Null-pointer checks can vanish under optimization because the compiler treats earlier undefined dereferences as impossible.

memset can be optimized out, undermining password-clearing efforts and leaving sensitive data in registers or stack.

Strict aliasing rules allow aggressive compiler optimizations that break type-punning patterns used in low-level code.

Signed overflow being undefined behavior can cause overflow checks to be deleted, even when developers expect them to work. 

Topics

C Language History
Unix Portability
Undefined Behavior
Compiler Optimizations
Strict Aliasing

Mentioned

Ken Thompson
Dennis Ritchie
John Carmack
Jeff Law
Andrew Pinsky
Linus Torvalds
Felix GCC
Andrew Pinsky
OSD
C89
PDP
GCC
UB
LLVM