Software Is Changing (Again)

TL;DR

Karpathy frames software evolution as three layers: explicit code (Software 1.0), neural-network weights that replace logic (Software 2.0), and prompts that act like programs for LLMs (Software 3.0).

Briefing Cornell Notes

Briefing

Software is changing again—this time less by rewriting programs and more by rewriting what “software” means. Andrej Karpathy frames three eras: Software 1.0 is explicit code that runs on computers; Software 2.0 is neural-network weights that replace large portions of hand-written logic; and Software 3.0 is prompts, where natural-language instructions function like programs that steer large language models. The practical implication is that modern development increasingly looks like tuning and orchestrating models and interfaces, not just authoring deterministic logic.

Karpathy argues that the shift is visible in how codebases evolve. In Tesla’s autopilot work, C++ logic gradually gave way to neural networks that absorbed capabilities—especially “stitching” information across camera views and time—until the neural stack “ate through” the software stack. The same pattern is now repeating in other domains: as models grow more capable, they replace brittle, fixed-function components with learned behavior. He also emphasizes that these paradigms have different strengths and failure modes, so fluency across them matters rather than treating one approach as universally superior.

A central theme is that LLMs behave like a new kind of computing layer—sometimes compared to utilities, fabs, or operating systems—but with a key twist: they’re distributed and accessed via time-sharing. Training requires massive upfront capital (capex), while serving is metered through APIs (opex), creating “utility-like” expectations such as low latency and high uptime. When major models go down, users experience an “intelligence brownout,” a reminder that people increasingly outsource not just facts but reasoning workflows.

Karpathy then zooms in on what makes LLMs distinct as software components. Prompts are written in English, a “programming language” that is flexible but ambiguous compared with traditional languages. He describes LLMs as stochastic simulators—autoregressive transformers trained on internet-scale text—capable of encyclopedic recall yet prone to hallucinations, jagged intelligence, and context-window limits. Unlike humans, they don’t naturally consolidate knowledge over time; working memory is constrained by context length, and long-term learning requires explicit mechanisms.

From there, the talk shifts to how LLMs are already reshaping software engineering through “partial autonomy” apps. Instead of chatting in a generic interface, tools like Cursor and Perplexity package model calls, context management, and verification into workflows with GUIs and an “autonomy slider.” The goal is fast generation plus human supervision: small diffs, rapid review loops, and mechanisms to keep models “on the leash” when prompts are vague or verification fails. Karpathy stresses that security and correctness can’t be reduced to quick code review; subtle system-level risks and cascading failures make automation dangerous without strong guardrails.

Finally, he argues that LLM adoption is also changing the digital information ecosystem. Documentation and interfaces are being adapted for machine consumption—moving toward LLM-readable formats, agent-friendly endpoints, and protocols that reduce friction (e.g., replacing “click here” instructions with actionable commands). The broader forecast is that the industry is entering a 1960s-like phase for a new computing paradigm: early, messy, and still constrained by cost and efficiency, but already powerful enough to trigger a new software stack and a new way of building.

Cornell Notes

Karpathy’s core claim is that “software” is evolving in three steps: explicit code (1.0), neural-network weights that replace hand-written logic (2.0), and prompts that act like programs for large language models (3.0). He connects the theory to real engineering change, describing how neural nets in Tesla’s autopilot absorbed capabilities and reduced C++ logic over time. LLMs are framed as stochastic simulators with strong recall but practical limits: hallucinations, context-window constraints, and security risks like prompt injection. The talk then explains why modern LLM apps focus on partial autonomy—fast generation plus human verification—using GUIs, orchestration, and “autonomy sliders.” This matters because it changes how developers build systems, how documentation must be structured, and how much reasoning people will outsource to models.

What does “Software 3.0” mean in practice, and why does it matter for developers?

Software 3.0 is the idea that prompts function like programs that steer a large language model. Instead of writing deterministic logic, developers write natural-language instructions that compile into model behavior. Karpathy argues this is closer to Software 1.0 than Software 2.0 because prompts translate intent into computation, with the LLM acting like a stochastic compiler. The practical effect is that development shifts toward prompt design, orchestration, and verification loops—especially in apps that wrap model calls with GUIs and context management.

How does Software 2.0 “eat through” Software 1.0, according to the autopilot example?

In Tesla’s autopilot work, Karpathy describes a gradual replacement process: C++ code handled many functions early, then neural networks expanded in capability and size, while C++ logic was deleted or reduced. A key example was learned “stitching” of information across multiple camera views and over time. As the neural stack covered more of the original hand-coded behavior, it absorbed more of the system’s responsibilities, shrinking the explicit code footprint.

Why does Karpathy compare LLMs to utilities or operating systems, and what are the limits of those analogies?

He uses the utility analogy because LLM labs invest heavily to train models (capex) and then serve intelligence over APIs with metered usage (opex), creating expectations like low latency and high uptime. He also compares LLM ecosystems to operating systems because they coordinate memory/compute via context windows and orchestrate problem-solving across tools. But he acknowledges the analogy can mislead: LLMs are software and are malleable, and the “OS” framing may not map cleanly to how traditional operating systems evolved under hardware constraints.

What are the main failure modes Karpathy highlights for LLMs?

He emphasizes hallucinations and “jagged intelligence” (superhuman performance in some domains, surprising mistakes in others). He also notes context-window limits—working memory must be managed explicitly—and “entrograde amnesia,” meaning LLMs don’t naturally consolidate knowledge over time the way humans do. Security concerns are also central: LLMs can be gullible to prompt injection and may leak data depending on how systems are built.

What does “partial autonomy” mean, and how do GUIs and verification fit in?

Partial autonomy apps generate or execute parts of tasks while keeping humans in the loop. Karpathy points to tools like Cursor and Perplexity as examples that orchestrate multiple model calls, manage context, and provide GUIs so users can audit changes. The “autonomy slider” concept captures varying levels of automation—from mostly user-controlled edits to more agentic actions—while verification and small diffs reduce the risk of large, incorrect changes.

How is the ecosystem adapting so LLMs can use information more reliably?

He argues that documentation and interfaces are being reshaped for machine consumption. Instead of relying on human-oriented instructions like “click here,” services are moving toward LLM-readable formats (e.g., markdown) and agent-friendly protocols that translate actions into machine-executable steps. He also mentions the idea of LLM-specific “site maps” (like an lm.txt concept) to guide model access, and tools that convert GitHub-style human interfaces into LLM-ingestible text.

Review Questions

Karpathy’s three software eras (1.0, 2.0, 3.0) are presented as a progression. Which era best describes: (a) hand-written business logic, (b) a fine-tuned model’s weights, and (c) a prompt that triggers tool use?
What practical engineering mechanisms does Karpathy say help keep LLM apps safe and useful—especially when models are fallible?
Why does context-window limitation change how developers design long-running or memory-dependent applications?

Key Points

1
Karpathy frames software evolution as three layers: explicit code (Software 1.0), neural-network weights that replace logic (Software 2.0), and prompts that act like programs for LLMs (Software 3.0).
2
Neural networks can progressively absorb capabilities from hand-written systems, shrinking traditional codebases—illustrated by Tesla autopilot’s shift from C++ toward learned components.
3
LLM services behave “utility-like” in how they’re trained and served, but they also create new operational risks such as “intelligence brownouts” when models go down.
4
LLMs are stochastic simulators: they can recall widely yet hallucinate, show jagged competence, and lack human-like knowledge consolidation over time.
5
Modern LLM apps emphasize partial autonomy: orchestration, GUIs for auditability, and an “autonomy slider” to balance speed with human verification.
6
Security can’t be treated as simple code review; system-level cascades and subtle vulnerabilities make automation risky without strong guardrails.
7
Documentation and digital interfaces are increasingly being reformatted for LLM consumption, replacing human-only instructions with machine-actionable structure.

Highlights

Software 2.0 can “eat through” Software 1.0: in autopilot development, learned models absorbed functions that were previously implemented in C++.

Prompts are treated as programs—Software 3.0—where English instructions steer LLM behavior like a stochastic compiler.

When LLMs fail, users experience an “intelligence brownout,” showing how much reasoning workflows are already outsourced.

Partial autonomy apps (e.g., Cursor-style workflows) combine orchestration with human auditability via GUIs and autonomy controls.

LLM adoption is driving a shift in documentation and interface design toward LLM-readable formats and agent-friendly protocols.

Topics

Software Eras
LLM Ecosystems
Partial Autonomy
Prompt Programming
LLM Security

Mentioned

Tesla
OpenAI
Gemini
Anthropic
Hugging Face
Cursor
Perplexity
Clerk
Vercel
Stripe
Llama
Andrej Karpathy
Ken Thompson
Dustin Hoffman
Anduring
LLM
API
capex
opex
GPU
CPU
MLP
DDoS
JWT
MCP
TPU

Software Is Changing (Again) - Andrej Karpathy