Stop Being So Dumb On Twitter

TL;DR

Basecamp’s reported peak throughput (~5,250 RPS) and ~90 ms median response time are treated as plausible when latency is measured within server-controlled boundaries rather than end-to-end internet round trips.

Briefing Cornell Notes

Briefing

A widely shared performance claim—Basecamp handling roughly 5,000 requests per second with about 90 milliseconds median response time—sparks a broader argument about whether modern web apps should chase “infinite scale” or accept a more predictable, capacity-planned setup. The practical takeaway is that, for many real businesses, high throughput can be achieved with surprisingly modest infrastructure when the system is engineered for the workload rather than benchmarked in isolation.

The discussion starts with the numbers: Basecamp reportedly peaked at about 5,250 requests per second, with response time around 90 milliseconds. That timing is framed as internal server performance (not end-to-end “round trip” latency across the internet), which makes the figure more plausible. From there, the conversation turns into back-of-the-envelope capacity planning: if the workload truly fits within those constraints, the peak might be supported by around 500 CPU cores, and—if redundancy were skipped—could potentially be handled by only a few machines. Even with redundancy and operational realities like databases and background jobs, the argument is that the compute bill may be far lower than critics assume, especially for B2B SaaS companies that don’t need extreme traffic spikes.

That leads to a critique of the internet’s two camps. One side emphasizes platforms associated with “levels” and “DHH” style thinking—capacity and engineering effort in exchange for a more uniform cost and controlled reliability. The other side points to “Vercel” and similar approaches that enable elastic scaling, but can become expensive at very high usage. The pushback is that neither camp is automatically “right” or “wrong”: most apps don’t require infinite scale, and even “platform simplicity” still carries costs—either in developer time spent managing infrastructure or in operational overhead outsourced to the cloud.

A major theme is how people evaluate performance. The transcript repeatedly targets “benchmark jockeying,” especially “hello world” tests that ignore garbage collection, database behavior, event-loop congestion, and the tail latencies that matter in production (p95/p99). Rendering-heavy requests—like calendars or large to-do lists—are cited as a more realistic stressor than synthetic RPS numbers. The speaker also argues that median response time can look excellent even when worst-case behavior isn’t measured, so critics who dismiss the result as impossible may be missing context.

The conversation then broadens into programming-language tradeoffs. Hardware and modern CPUs make high RPS more attainable, but language choice still matters for developer productivity and operational fit. JavaScript and Lua are defended for their dynamic, script-updating strengths in UI and embedded contexts, while the broader “rewrite everything in Rust/Go/Zig” mindset is treated as optional and often emotionally or practically unrealistic. Ultimately, the message is less about declaring a winner and more about stopping the reflexive hostility: if a team can build a fast, reliable product that earns revenue, the market will respond. Critics are urged to either build something better—or accept that different engineering choices can still produce successful outcomes.

Cornell Notes

Basecamp’s reported peak—about 5,000+ requests per second with ~90 ms median response time—becomes a springboard for a debate about how web apps should scale and how performance should be measured. The core claim is that many real B2B SaaS workloads don’t need “infinite scale,” and that capacity-planned systems can deliver strong latency with manageable infrastructure costs. The transcript also argues that synthetic “hello world” benchmarks are misleading because they ignore database work, garbage collection, event-loop saturation, and tail latency (p95/p99). Language and platform choices are framed as tradeoffs between developer effort, operational complexity, and runtime efficiency rather than moral victories. The practical lesson: judge systems by production-relevant behavior, not by isolated RPS bragging.

Why does ~90 ms median response time matter more than raw requests-per-second bragging?

Because median latency reflects how quickly typical requests complete under load. The transcript stresses that the 90 ms figure is likely measured within the server’s control (not full internet round-trip), making it a meaningful indicator of internal responsiveness. It also contrasts median with tail metrics like p95/p99, warning that a system can look great on average while still having problematic worst cases.

What’s the argument behind estimating CPU needs from peak RPS?

The discussion uses the reported peak (around 5,250 RPS) and a rough core-per-load assumption to suggest that the workload might fit within ~500 cores at max load. It then notes that redundancy, databases, and background jobs would increase real requirements, but still may not push costs into the “impossibly expensive” range critics assume. The point is that compute can be “thrown at” problems more effectively than many online commenters expect.

How does the transcript criticize common performance evaluation methods?

It calls out “benchmark jockeying,” especially “hello world” tests that can hit huge RPS without representing real application behavior. Real apps include database calls, rendering work (e.g., calendars/to-do lists), garbage collection effects, and event-loop congestion in Node. Those factors strongly influence tail latency (p95/p99), so focusing only on short, synthetic RPS snapshots can produce a misleading sense of performance.

What’s the “two camps” debate, and why does the transcript reject a simple winner/loser framing?

One camp favors capacity-planned scaling (associated with “levels”/DHH-style thinking), aiming for predictable costs and engineering control. The other camp favors elastic “infinite scale” (associated with Vercel), which can be cheap at moderate loads but may become costly at high scale. The transcript rejects the idea that one approach is inherently foolish: most apps don’t need infinite scale, and both paths trade off developer time, operational complexity, and reliability engineering.

How does language choice fit into the scaling/performance argument?

The transcript treats language as a tradeoff between runtime efficiency and developer productivity. It defends dynamic scripting languages like JavaScript and Lua for UI and embedded contexts because they can be updated by shipping scripts without full recompiles or heavy deployment cycles. At the same time, it acknowledges that rewriting everything in Rust/Go/Zig to chase efficiency is not always practical—sometimes the “rewrite” impulse is driven by emotion or unrealistic expectations rather than measurable necessity.

What does the transcript suggest about how to respond to online criticism?

It argues that reflexive “anti” negativity is unproductive. Instead of arguing about who’s “dumb,” the transcript urges focusing on outcomes: if a team builds a fast, reliable product that earns revenue, that’s evidence the engineering choices work. Critics should either build a better alternative or accept that different constraints lead to different architectures.

Review Questions

What production factors (beyond simple request handling) can cause tail latency to worsen even when median response time looks strong?
How do redundancy, databases, and background jobs change the meaning of back-of-the-envelope core estimates from peak RPS?
Why might “infinite scale” still be unnecessary for most SaaS businesses, according to the transcript’s reasoning?

Key Points

1
Basecamp’s reported peak throughput (~5,250 RPS) and ~90 ms median response time are treated as plausible when latency is measured within server-controlled boundaries rather than end-to-end internet round trips.
2
Compute requirements can be estimated from peak RPS, but real systems require extra headroom for redundancy, databases, and background work.
3
Synthetic “hello world” benchmarks are criticized as misleading because they omit database behavior, garbage collection, event-loop saturation, and tail-latency effects (p95/p99).
4
The scaling debate is framed as a tradeoff between predictable capacity planning (uniform bills, more engineering effort) and elastic platforms (potentially higher costs at extreme scale).
5
Most apps don’t need “infinite scale,” so platform choice should match actual workload and reliability requirements rather than ideology.
6
Language and deployment strategy are presented as practical tradeoffs: dynamic scripting can simplify UI updates, while low-level rewrites for performance may be unnecessary or unrealistic.
7
Online hostility (“anti” behavior) is portrayed as less useful than focusing on measurable outcomes like reliability, latency, and business success.

Highlights

A peak of roughly 5,000+ RPS with ~90 ms median response time is presented as evidence that strong performance can come from capacity-planned engineering, not only from elastic “infinite scale.”

The transcript repeatedly warns that benchmark jockeying—especially hello-world RPS—can hide the real bottlenecks that show up in p95/p99 under database and rendering load.

The scaling argument rejects binary thinking: most SaaS workloads don’t require infinite scale, and both platform simplicity and capacity planning come with tradeoffs in cost and engineering effort.

Dynamic scripting languages like JavaScript and Lua are defended for UI and embedded use because they enable fast updates without heavy recompilation cycles.

Topics

Basecamp Performance
Requests Per Second
Tail Latency
Scaling Tradeoffs
Language Choice