Linus On LLMs For Coding
Based on The PrimeTime's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
LLMs are expected to become a standard coding aid, with some workflows already using them to generate code that developers submit as requests.
Briefing
Large language models are likely to become a routine part of coding—first as assistants that help generate or check code, and eventually as tools that can contribute to code review and maintenance—but their usefulness depends heavily on human understanding and strong testing, not on “autopilot” trust. The conversation centers on a practical question: will LLM-written code be submitted as a request? The answer lands on “yes,” already happening in smaller ways, because automation has steadily moved closer to the developer workflow over decades.
A key tension runs through the discussion: optimism about LLM capability versus skepticism about reliability. One side points to the near-term value of catching “obvious stupid bugs” and flagging patterns that deviate from expected norms—similar to what compilers and linters do, but potentially at a higher level of nuance. The other side highlights a hard limit: LLMs can hallucinate, invent code paths, or produce confident-sounding mistakes. That risk becomes more serious when models are allowed to act without a human catching errors, especially in security-sensitive contexts.
The most concrete example involves security bug reporting around curl. A reported issue allegedly stems from a buffer-related error path that, in practice, does not match the repository’s actual code. The current maintainer pushes back, arguing the problematic snippet isn’t present, and describes an interaction where the person using the LLM believed the model’s output rather than verifying it against the codebase. The takeaway isn’t that LLMs can’t help; it’s that they can generate plausible but wrong details, and those errors can waste time or even misdirect security work.
Another thread argues that LLMs are best when they support—not replace—developer judgment. Manual translation between languages (cited in the discussion as an approach where a team outperformed “automagic” conversion) is used to illustrate why nuance matters: understanding edge cases and the real problem domain can outperform automation that merely “gets the syntax right.” The same logic extends to testing. The conversation repeatedly returns to the idea that good automated tests are the gatekeeper for safe adoption, yet there’s skepticism that “good enough” coverage exists broadly enough to make fully automated LLM-driven changes dependable.
The discussion also challenges how people talk about LLMs. Claims of being “10x better” after minimal use are treated as misleading, because they may only reflect improvement from a low baseline rather than mastery of what “good” looks like. There’s also a call for longer, deliberate experimentation—trying tools like Claude and other assistants over months rather than minutes.
Overall, the core insight is conditional: LLMs can be powerful accelerators for developers who understand the nuance of the system and can validate outputs with tests and review. Without that grounding, hallucinations and subtle misunderstandings can turn productivity gains into costly bugs—especially when teams scale usage faster than their verification practices.
Cornell Notes
Large language models are expected to become a normal part of coding, moving beyond autocomplete into assistance for writing, reviewing, and maintaining code. The promise is strongest for catching obvious mistakes and flagging suspicious patterns, but reliability is constrained by hallucinations and confident wrong outputs. The curl security example highlights how LLM-generated snippets can be plausible yet not present in the real codebase, wasting time and misdirecting fixes. Adoption is safest when developers retain domain understanding and enforce verification through strong automated tests and human review. Short trials and hype-driven claims are treated as unreliable; meaningful evaluation requires sustained use and careful comparison against real engineering standards.
Why does the conversation treat LLM coding as “automation,” not a revolutionary leap?
What’s the strongest argument for LLMs helping with code review and maintenance?
How does the curl example illustrate the main failure mode?
Why does “manual translation” beat “automagic translation” in the cited discussion?
What role do tests play in making LLM-assisted coding safe?
Why does the discussion criticize quick, shallow evaluations of LLMs?
Review Questions
- What specific verification step does the curl example imply is necessary before acting on LLM-generated code or security claims?
- How does the discussion connect LLM usefulness to developer domain understanding and the quality of automated tests?
- What reasons are given for why short LLM trials can lead to misleading conclusions about productivity gains?
Key Points
- 1
LLMs are expected to become a standard coding aid, with some workflows already using them to generate code that developers submit as requests.
- 2
The most credible near-term value is catching obvious bugs and flagging suspicious deviations from expected code patterns.
- 3
Hallucinations remain a central risk: LLM outputs can include plausible but nonexistent code paths, as illustrated by the curl security dispute.
- 4
Safe use depends on human domain understanding plus verification—especially strong automated tests and careful review.
- 5
Automation that replaces nuance (e.g., “automagic” translation) can underperform approaches that preserve deep problem understanding.
- 6
Claims of dramatic productivity gains after minimal LLM use are treated as unreliable without sustained evaluation and baseline comparison.