Get AI summaries of any video or article — Sign up free
How BAD Is Test Driven Development? - The Standup #6 thumbnail

How BAD Is Test Driven Development? - The Standup #6

The PrimeTime·
6 min read

Based on The PrimeTime's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

TDD can distort design when the interface optimized for tests doesn’t match the interface needed for real usage.

Briefing

Test-driven development (TDD) drew heavy skepticism in a standup-style debate, with the core complaint landing on a simple trade-off: forcing development to revolve around writing tests can distort design toward testability rather than real-world use. Multiple participants said they like testing, but dislike “test-first” as a primary workflow, arguing that it can produce interfaces and APIs that work under the test harness yet feel awkward or even unusable in practice.

One developer described a recent TDD attempt while building a card-draw mechanic for a game (“The Towers of Mordoria”). The test-driven approach felt great in isolation: the API was designed in the test, tests were set up to fail, and then the implementation was made to pass. The trouble came when integrating into the actual game. The interface that was ideal for testing turned out to be a poor fit for real usage, requiring a redesign. That experience became a broader pattern for him: TDD works best when the “testing interface” and the “usage interface” naturally align, but it often pushes developers toward inversion-of-control abstractions and overly modular interfaces that don’t match how the system is meant to be used.

Another participant echoed the concern, framing it as a zero-sum problem. Time spent constructing tests and shaping architecture around them is time not spent building for the primary use case. He argued that the emphasis on tests can lead to the same failure mode seen in other “optimize for a metric” cultures: lots of effort goes into satisfying the testing ritual, while the resulting system still needs fixing. He also pointed to real-world examples of testing that didn’t prevent user-facing defects, using YouTube’s play/pause indicator inconsistency as a cautionary tale.

Still, the discussion wasn’t anti-testing. Several people described pragmatic testing strategies that deliver confidence without the rigid TDD loop. One approach favored granular tests for stable components (like reusable parsers) and targeted multi-step tests when the system behavior is hard to validate otherwise. Another leaned into snapshot or “golden” testing—asserting that a printed representation (diff-friendly) matches a stored expected output—arguing it’s powerful when the representation is stable and changes are meaningful. But snapshot testing drew its own caveats: if requirements churn or the underlying structure changes frequently, snapshots become noisy, hard to trust, and expensive to update.

The most constructive “TDD” defense came from reframing it as a tool for specific situations. One participant suggested that driving development with tests can be valuable when a task lacks feedback—especially low-level components with no direct visual output—because tests can provide a measurable target (e.g., performance counters like cache misses or cycles). Others argued that mature teams often add tests after discovering bugs (“test-driven debugging”), turning recurring failure modes into regression nets rather than treating tests as the starting point.

By the end, the group converged on a shared message: testing is essential, but the workflow should match the problem. Use tests to catch hard-to-find or catastrophic failures, prefer stable assertions (including snapshots when appropriate), and rely on incremental releases and real-world feedback when the system is still in flux. The debate also included a cautionary anecdote about production incidents and the limits of relying on unit-level confidence alone—reinforcing the idea that testing strategy, not test volume, determines whether software actually improves.

Cornell Notes

The discussion draws a sharp line between “testing” and “test-driven development.” Participants generally agree that tests are valuable, but forcing development to be driven by tests can distort architecture—especially when the interface optimized for tests doesn’t match real usage. Several people prefer targeted testing: granular unit tests for stable components, end-to-end tests for integration confidence, and snapshot/golden tests when outputs can be represented stably and diffed. Others argue that TDD can be useful in narrow cases, such as low-level work with no natural feedback or performance tuning where tests provide measurable targets. The takeaway is to choose the right testing strategy for the system’s stability and risk profile rather than treating TDD as a universal rule.

Why did one developer end up redoing a TDD-designed interface after it “worked” in tests?

He built a card-draw mechanic using TDD and felt the workflow was smooth: the API was designed around tests, tests were initially failing, then the implementation was made to pass. But once the code was integrated into the actual game, the interface that was ideal for testing proved awkward for real gameplay usage. The testing-friendly design and the usage-friendly design diverged, forcing a redesign—an example of how test-first constraints can optimize for the harness instead of the product.

What’s the central critique of TDD that treats it as a zero-sum trade-off?

One participant argued that time spent writing tests and shaping architecture around them is time not spent building for the primary use case. If developers orient interfaces toward being easy to test, they may sacrifice usability and real-world design quality. He also suggested that cultures obsessed with a testing ritual can end up with lots of tests that don’t prevent meaningful user-facing failures, because the emphasis shifts from solving the problem to satisfying the testing process.

How do snapshot/golden tests fit into the debate, and when do they work best?

Snapshot tests were described as powerful when the output representation is stable and human-readable—like printed diffs of parse trees, parse structures, type system states, or complex internal data structures. The confidence comes from asserting the entire state at once, making regressions obvious in code review. The main failure mode is churn: if requirements or internal representations change frequently, snapshots break constantly and become “update-and-forget,” reducing trust and increasing maintenance cost.

What alternative to TDD did participants describe for building confidence?

Several described a “test as the system takes shape” approach: wait until the behavior is understood, then add tests for parts likely to fail or be catastrophic. Others emphasized end-to-end tests for refactoring confidence at the system level, while using unit tests selectively for stable, isolated components. A related idea was “test-driven debugging,” where teams add tests after bugs are found to prevent repeats, rather than writing tests before any implementation exists.

When did someone argue that test-driven development can still be justified?

He suggested TDD can help when a task has low feedback—especially low-level components with no visual/auditory output—because tests can turn progress into measurable signals. Performance work was given as an example: tests can assert metrics like cache misses or cycles, giving developers a concrete target and motivation while still producing the measurement they ultimately need.

What did the group imply about relying on unit tests alone during refactors?

They argued that unit tests can give misleading confidence if they don’t cover integration edge cases. Refactoring can change the “shape” of edge conditions, so tests written for one internal structure may not catch new interactions. The consensus leaned toward combining selective unit tests with end-to-end or high-level regression checks, plus real-world feedback (like staged rollouts) when systems are changing quickly.

Review Questions

  1. What specific failure mode did the game developer experience when TDD produced a testing-optimized interface that didn’t translate to real usage?
  2. Under what conditions do snapshot/golden tests become high-confidence tools versus maintenance burdens?
  3. Why did participants argue that “testing for refactoring confidence” can be overstated when integration edge cases are the real risk?

Key Points

  1. 1

    TDD can distort design when the interface optimized for tests doesn’t match the interface needed for real usage.

  2. 2

    Testing is broadly valued, but participants object to treating “test-first” as a universal development workflow.

  3. 3

    Time spent building tests and test-friendly abstractions is time not spent designing for the primary use case (a zero-sum framing).

  4. 4

    Snapshot/golden testing can be extremely effective when the asserted representation is stable and diffable, but it becomes noisy when requirements or internal structures churn.

  5. 5

    A pragmatic strategy often combines granular unit tests for stable components, end-to-end tests for integration confidence, and targeted regression tests for high-risk areas.

  6. 6

    Test-driven debugging (adding tests after discovering bugs) can produce better regression nets than strict red-green TDD loops.

  7. 7

    Driving development with tests can be useful for low-feedback tasks and performance tuning when tests provide measurable targets.

Highlights

A TDD-built API for a game mechanic passed tests cleanly, then broke down in real integration because the testing interface didn’t fit actual gameplay needs.
Snapshot/golden tests were defended as powerful when outputs are stable and human-readable, but criticized when updates become constant and the expected output becomes guesswork.
Multiple participants separated “testing” from “test-driven development,” arguing that forcing architecture around tests can lead to unusable APIs.
TDD was reframed as most useful when developers need feedback—like performance counters—rather than as a blanket rule for all work.
The group repeatedly returned to the idea that unit-level confidence can miss integration edge cases, so strategy must match risk and system stability.

Topics

Mentioned

  • TDD
  • OOP
  • OKRs