Get AI summaries of any video or article — Sign up free
Cursor, Claude Code and Codex all have a BIG problem thumbnail

Cursor, Claude Code and Codex all have a BIG problem

Theo - t3․gg·
6 min read

Based on Theo - t3․gg's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

AI coding assistants inherit and amplify underlying engineering quality problems, not just model variability.

Briefing

AI coding assistants like Cursor, Claude Code/Cloud Code, and Codex are failing in a way that’s easy to miss: they don’t just rely on today’s models—they inherit the same engineering shortcuts, UI churn, and “ship fast” code quality that those models were trained and built around. The result is a growing mismatch between what developers need (stable UX, predictable behavior, maintainable codebases) and what these tools deliver (non-deterministic actions, inconsistent interfaces, and brittle underlying implementations). The practical takeaway is blunt: betting on AI to code early can accelerate sloppiness, and sloppiness compounds.

A major thread runs through the complaints: these tools feel unreliable not only because model outputs vary, but because the surrounding software ecosystem changes constantly and often poorly. Cursor’s interface is used as an example of how product decisions can break workflows—specifically, the removal of an “agent/editor toggle” that users relied on to switch modes quickly. The change layout and sidebar behavior become inconsistent, hotkeys shift, and even basic safety concerns show up, such as an email “leak” button being reachable in the editor. The frustration isn’t just annoyance; it’s workflow disruption—opening the app can trigger UI shifts, and small interaction problems can become recurring hazards.

Claude Code/Cloud Code is framed as worse because the failures are deeper and more chaotic. Image pasting in a terminal/CLI context can involve local compression and asynchronous upload, but input isn’t blocked while attachments process. That leads to messages being submitted without the image, then later attachments appearing in unexpected ways, including race conditions where a still-processing image silently attaches to a later message. In one described sequence, repeated attachment failures and compression errors eventually hit a context limit and kill the entire thread—work gone with no recovery. The pattern is described as a “slot fest”: it doesn’t fail the same way twice, which makes debugging and trust nearly impossible.

Underneath the UX breakdown is a structural claim about codebases over time. Code quality tends to improve early, then hits a plateau around 3–6 months; after that, it’s downhill unless teams actively prevent decay. AI coding tools accelerate both the good and the bad, but the bad expands faster. As codebases grow, the easiest patterns to copy are often the worst ones, and AI systems can replicate those patterns across the repository—especially when they’re encouraged to reference existing code. The speaker argues that no model can outperform the quality of what it starts with, and many of these systems start from “garbage” code written earlier under the same vibe-coding incentives.

The proposed fix is less about better models and more about stricter engineering discipline: optimize for clarity and speed of change so small edits touch few files; tolerate zero bad patterns because bad code multiplies; and treat “smells” as urgent—remove them immediately rather than waiting for deadlines that never arrive. There’s also an operational suggestion: use planning-heavy workflows with agents, read the plan, and upgrade to newer model capabilities. Finally, the most radical idea is to separate prototyping from production—maintain a “slopfest” version for experimentation and a clean, reliable version for shipping, similar to how Vampire Survivors reportedly evolved from browser Phaser.js to a C++ console version while iterating in the easier environment.

In short: these tools aren’t doomed because AI is inherently bad; they’re doomed because AI accelerates the maintenance and quality problems already embedded in the software they’re built on. The path forward is to build codebases that can survive faster iteration—by preventing slop from entering, and by rebuilding when slop has already won.

Cornell Notes

AI coding tools like Cursor, Claude Code/Cloud Code, and Codex are portrayed as unreliable because they inherit the same engineering sloppiness and churn as the systems around them—not just because model outputs vary. The core mechanism is “codebase inertia”: after a few months, code quality plateaus, and without strict prevention, bad patterns multiply faster than good ones. Non-deterministic UX failures (like asynchronous image attachment race conditions) make this decay feel catastrophic because work can be lost and bugs don’t reproduce consistently. The proposed remedy is engineering discipline: make codebases easy to change, tolerate zero bad patterns, kill “smells” immediately, and use planning-heavy agent workflows. In extreme cases, maintain a separate prototyping “slop” branch/version and port only validated parts into a clean production codebase.

Why does the speaker claim AI coding assistants can make codebases worse even when models improve?

The argument is that model quality can’t exceed the quality of the starting code. Early AI-assisted development tends to introduce “vibe-coded” patterns—convenient shortcuts that get copied quickly. Over time, codebases reach a plateau (roughly 3–6 months) where quality stops improving; after that, bad patterns expand faster than good ones. Agents and tools accelerate replication: they may copy patterns from existing code because those patterns are present and “seem to work,” even if they’re flawed. The result is a compounding “slopfest,” where the easiest-to-copy patterns are rarely the best ones.

What concrete UX/behavior failures are used to illustrate non-determinism and brittleness?

Cursor is used to show interface instability: removing an agent/editor toggle breaks workflows, changes sidebar placement when switching modes, and even exposes a one-click email leak button in the editor. Claude Code/Cloud Code is used to show deeper asynchronous failure modes: image pasting triggers local compression/upload, but input isn’t blocked, so messages can be submitted without attachments. Worse, race conditions can silently attach a still-processing image to a later message, and repeated failures can eventually hit a context limit and kill the thread with no recovery.

What is “codebase inertia,” and how does it connect to the “6-month mark” idea?

Codebase inertia is the claim that after an initial period of focused improvement, code quality stops getting better and becomes harder to change. The speaker describes a typical lifecycle: early on, the codebase finds a workable structure; then quality plateaus; after about 3–6 months, the best it will ever be is reached. If the codebase at that point is already messy—built from early AI-assisted “slop”—then later changes don’t reverse the trajectory. Instead, the repository drifts into slower, buggier behavior as bad patterns accumulate.

How does the speaker propose preventing bad patterns from multiplying?

The prescription is strictness plus architecture. First, optimize for ease, clarity, and speed: small changes should touch few files; big changes can touch more. Second, tolerate nothing: if a bad pattern enters, it multiplies, so deadlines aren’t an excuse to defer fixes. If something smells, remove it immediately—“murder it with intensity”—and don’t wait for later. A practical litmus test is asking an agent how a feature works: if it can’t answer quickly without extensive searching, the codebase likely isn’t maintainable for humans or agents.

What does “plan mode” and “read the plan” mean in the context of using agents?

The speaker argues that planning reduces mistakes. Spend more time in plan mode with the agent, ensure the plan sounds correct, and actually read it—especially for larger changes. The workflow is: iterate on the plan through back-and-forth, produce a markdown document describing the approach, decide whether it’s acceptable, then instruct the agent to implement. The claim is that models are better at planning and conversation than at cleaning up existing mess, so the plan becomes the guardrail that prevents slop from being reproduced.

Why does the speaker suggest maintaining two versions of a codebase (slop vs production)?

The idea is to separate experimentation from reliability. Using Vampire Survivors as an analogy, the game reportedly iterates in a browser Phaser.js version (where the creator can quickly test ideas) and then ports validated behavior into a polished C++ production version for consoles/Steam. Applied to AI-assisted development, the speaker suggests using a “slopfest” environment for rapid prototyping—possibly even with AI—then rebuilding or porting only the parts that prove valuable into a clean, reliable codebase. This preserves iteration speed without letting prototype slop infect production.

Review Questions

  1. What mechanisms make bad code patterns spread faster than good ones when AI agents are involved?
  2. How do asynchronous UI/CLI behaviors (like non-blocking image uploads) turn non-determinism into lost work?
  3. Which engineering practices does the speaker recommend to keep a codebase maintainable for both humans and agents?

Key Points

  1. 1

    AI coding assistants inherit and amplify underlying engineering quality problems, not just model variability.

  2. 2

    Cursor’s workflow breakage is tied to interface changes that remove reliable mode switching and introduce risky UI behavior.

  3. 3

    Claude Code/Cloud Code failures are described as race-condition-driven and non-reproducible, including attachment processing that can silently mis-associate images and kill threads.

  4. 4

    Codebase quality typically plateaus after roughly 3–6 months; if the codebase is already “vibe-coded slop” at that point, later improvement is unlikely without major intervention.

  5. 5

    Bad patterns multiply faster than good ones because agents copy convenient existing code, and the easiest patterns to replicate are often the worst.

  6. 6

    Prevention strategy centers on strictness: optimize for small, localized changes; tolerate zero bad patterns; and remove “smells” immediately rather than deferring fixes.

  7. 7

    In extreme cases, separate prototyping from production by maintaining a rapid “slop” version for experimentation and porting validated parts into a clean production codebase.

Highlights

Non-determinism isn’t just the model’s output—UI and CLI behavior can be asynchronous in ways that cause silent attachment errors and irreversible thread loss.
Codebase inertia is the core lifecycle problem: after a few months, quality stops improving, so early slop becomes the baseline for years of decay.
Agents can accelerate slop replication because they copy patterns from the codebase that are easy to find, even when those patterns are bad.
The proposed antidote is strict engineering hygiene: make changes cheap, tolerate nothing, and kill bad patterns immediately.
A radical escape hatch is to maintain a “slopfest” prototyping version and a separate reliable production version, porting only what works.

Topics

  • AI Coding Tools
  • Codebase Inertia
  • Non-Deterministic UX
  • Agent Workflow Planning
  • Slopfest Refactoring