Cursor, Claude Code and Codex all have a BIG problem
Based on Theo - t3․gg's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
AI coding assistants inherit and amplify underlying engineering quality problems, not just model variability.
Briefing
AI coding assistants like Cursor, Claude Code/Cloud Code, and Codex are failing in a way that’s easy to miss: they don’t just rely on today’s models—they inherit the same engineering shortcuts, UI churn, and “ship fast” code quality that those models were trained and built around. The result is a growing mismatch between what developers need (stable UX, predictable behavior, maintainable codebases) and what these tools deliver (non-deterministic actions, inconsistent interfaces, and brittle underlying implementations). The practical takeaway is blunt: betting on AI to code early can accelerate sloppiness, and sloppiness compounds.
A major thread runs through the complaints: these tools feel unreliable not only because model outputs vary, but because the surrounding software ecosystem changes constantly and often poorly. Cursor’s interface is used as an example of how product decisions can break workflows—specifically, the removal of an “agent/editor toggle” that users relied on to switch modes quickly. The change layout and sidebar behavior become inconsistent, hotkeys shift, and even basic safety concerns show up, such as an email “leak” button being reachable in the editor. The frustration isn’t just annoyance; it’s workflow disruption—opening the app can trigger UI shifts, and small interaction problems can become recurring hazards.
Claude Code/Cloud Code is framed as worse because the failures are deeper and more chaotic. Image pasting in a terminal/CLI context can involve local compression and asynchronous upload, but input isn’t blocked while attachments process. That leads to messages being submitted without the image, then later attachments appearing in unexpected ways, including race conditions where a still-processing image silently attaches to a later message. In one described sequence, repeated attachment failures and compression errors eventually hit a context limit and kill the entire thread—work gone with no recovery. The pattern is described as a “slot fest”: it doesn’t fail the same way twice, which makes debugging and trust nearly impossible.
Underneath the UX breakdown is a structural claim about codebases over time. Code quality tends to improve early, then hits a plateau around 3–6 months; after that, it’s downhill unless teams actively prevent decay. AI coding tools accelerate both the good and the bad, but the bad expands faster. As codebases grow, the easiest patterns to copy are often the worst ones, and AI systems can replicate those patterns across the repository—especially when they’re encouraged to reference existing code. The speaker argues that no model can outperform the quality of what it starts with, and many of these systems start from “garbage” code written earlier under the same vibe-coding incentives.
The proposed fix is less about better models and more about stricter engineering discipline: optimize for clarity and speed of change so small edits touch few files; tolerate zero bad patterns because bad code multiplies; and treat “smells” as urgent—remove them immediately rather than waiting for deadlines that never arrive. There’s also an operational suggestion: use planning-heavy workflows with agents, read the plan, and upgrade to newer model capabilities. Finally, the most radical idea is to separate prototyping from production—maintain a “slopfest” version for experimentation and a clean, reliable version for shipping, similar to how Vampire Survivors reportedly evolved from browser Phaser.js to a C++ console version while iterating in the easier environment.
In short: these tools aren’t doomed because AI is inherently bad; they’re doomed because AI accelerates the maintenance and quality problems already embedded in the software they’re built on. The path forward is to build codebases that can survive faster iteration—by preventing slop from entering, and by rebuilding when slop has already won.
Cornell Notes
AI coding tools like Cursor, Claude Code/Cloud Code, and Codex are portrayed as unreliable because they inherit the same engineering sloppiness and churn as the systems around them—not just because model outputs vary. The core mechanism is “codebase inertia”: after a few months, code quality plateaus, and without strict prevention, bad patterns multiply faster than good ones. Non-deterministic UX failures (like asynchronous image attachment race conditions) make this decay feel catastrophic because work can be lost and bugs don’t reproduce consistently. The proposed remedy is engineering discipline: make codebases easy to change, tolerate zero bad patterns, kill “smells” immediately, and use planning-heavy agent workflows. In extreme cases, maintain a separate prototyping “slop” branch/version and port only validated parts into a clean production codebase.
Why does the speaker claim AI coding assistants can make codebases worse even when models improve?
What concrete UX/behavior failures are used to illustrate non-determinism and brittleness?
What is “codebase inertia,” and how does it connect to the “6-month mark” idea?
How does the speaker propose preventing bad patterns from multiplying?
What does “plan mode” and “read the plan” mean in the context of using agents?
Why does the speaker suggest maintaining two versions of a codebase (slop vs production)?
Review Questions
- What mechanisms make bad code patterns spread faster than good ones when AI agents are involved?
- How do asynchronous UI/CLI behaviors (like non-blocking image uploads) turn non-determinism into lost work?
- Which engineering practices does the speaker recommend to keep a codebase maintainable for both humans and agents?
Key Points
- 1
AI coding assistants inherit and amplify underlying engineering quality problems, not just model variability.
- 2
Cursor’s workflow breakage is tied to interface changes that remove reliable mode switching and introduce risky UI behavior.
- 3
Claude Code/Cloud Code failures are described as race-condition-driven and non-reproducible, including attachment processing that can silently mis-associate images and kill threads.
- 4
Codebase quality typically plateaus after roughly 3–6 months; if the codebase is already “vibe-coded slop” at that point, later improvement is unlikely without major intervention.
- 5
Bad patterns multiply faster than good ones because agents copy convenient existing code, and the easiest patterns to replicate are often the worst.
- 6
Prevention strategy centers on strictness: optimize for small, localized changes; tolerate zero bad patterns; and remove “smells” immediately rather than deferring fixes.
- 7
In extreme cases, separate prototyping from production by maintaining a rapid “slop” version for experimentation and porting validated parts into a clean production codebase.