Making Postgres 42,000x slower

TL;DR

Start with a measurable baseline (TPC-C via benchbase, 128 warehouses, 100 connections) to quantify how far performance can be pushed in either direction.

Briefing Cornell Notes

Briefing

Postgres can be driven to extreme slowdown—about 42,000× slower than a default setup—by tuning only configuration parameters, while still keeping enough transactional throughput to avoid a total shutdown. The exercise matters because it demonstrates how fragile “performance” can be: small, well-intentioned changes to caching, maintenance behavior, and write-ahead logging can compound into a system that spends most of its time doing expensive work instead of serving queries.

The benchmark starts with a baseline using TPC-C (via benchbase) at 128 warehouses, 100 connections, and a target of 10,000 TPS, running on Linux on a Ryzen 7950X with 32 GB RAM and a 2 TB SSD. With Postgres 19 (as described), default settings are adjusted only for a few standard performance knobs—shared buffers, work memory, and worker processes—yielding roughly 7,082 TPS. From there, the goal flips: force Postgres to read and write as inefficiently as possible, without deleting indexes or otherwise “cheating” by changing schema.

The first major lever is cache starvation. Postgres’ shared buffers and related caching behavior are central to avoiding disk reads; shrinking shared buffers forces more page requests to miss RAM and hit the operating system instead. The tuning sequence pushes shared buffers down aggressively: from 10 GB down to 8 MB, then toward ~2 MB, where the system’s throughput collapses to under 500 TPS and later to roughly 200–300 TPS. The hit rate drops sharply (from ~99.9% to around the 70% range at one point), which drives a surge in read system calls.

Next comes maintenance sabotage. Autovacuum and autoanalyze are reconfigured to run far more frequently, with vacuum cost limits set so vacuuming rarely pauses, and maintenance memory/logging adjusted to make vacuum work heavier. The result is that vacuum and analysis repeatedly touch “hot” tables, and because the cache is already starved, each run forces significant disk reads. Logs are used to confirm the performance hit lines up with frequent vacuum/analyze activity.

Then the write path is tuned to be as slow as possible. Write-ahead logging (WAL) and checkpoint behavior are made to flush and checkpoint constantly: WAL flush delays are minimized, checkpoint frequency is increased, and flush/checkpoint I/O is forced to happen in the most punishing way (including settings like open data sync and full-page write behavior). Checkpoints that normally wouldn’t be so frequent begin occurring repeatedly, and the throughput drops further—down to roughly 98× slower and then below ~170× slower.

Finally, index usage is discouraged without removing indexes. By increasing the relative cost of random page access (random_page_cost) and adjusting CPU-related index costs, the planner is pushed toward sequential scans, which are slower under the cache-starved conditions. Throughput falls again to around 87 TPS, then below 1 TPS after additional tuning.

A last step uses Postgres 18’s I/O controls to force I/O into a single worker thread (via the IO method knob and related worker settings). With I/O effectively serialized across the workload, the system reaches the headline outcome: well below 0.1 TPS, with only 11 transactions completing successfully across 100 connections and 120 seconds, and even more failures due to deadlocks. The takeaway is blunt: configuration-only changes can turn Postgres into a near-dead system, and the same knobs that help performance can be weaponized against it.

Cornell Notes

A default-tuned Postgres setup at ~7,082 TPS (TPC-C, 128 warehouses, 100 connections) can be pushed to roughly 42,000× slower—well under 0.1 TPS—using configuration parameters alone. The slowdown comes from chaining three effects: starving shared buffers to force disk reads, making autovacuum/autoanalyze run constantly so maintenance repeatedly hits disk, and degrading WAL/checkpoint behavior so commits and checkpoints flush far more aggressively. Indexes aren’t removed; planner costs are adjusted so sequential scans become preferable under the cache-starved conditions. The final step serializes I/O using Postgres 18’s IO method controls, making throughput collapse even further, with many transactions failing due to deadlocks.

How does shrinking shared buffers translate into a large TPS drop?

Postgres caches disk blocks in RAM (via shared buffers and related caching). When shared buffers are reduced from large values down to a few megabytes, far fewer page requests are satisfied from memory. In the run described, lowering shared buffers caused the cache hit rate to fall from about 99.9% to roughly 70% at one point, which increased read system calls by hundreds of times. That disk-heavy behavior dominates query latency, driving TPS from thousands down to hundreds and then lower.

Why does reconfiguring autovacuum and autoanalyze hurt performance so much in this setup?

Autovacuum and autoanalyze normally run on schedules and with cost limits to avoid excessive overhead. Here, the minimum delay between autovacuum runs is reduced (down to a floor of 1 second), vacuum cost limits are set so vacuuming doesn’t pause, and maintenance memory/logging are adjusted. Because shared buffers are already tiny, each frequent vacuum/analyze cycle forces significant disk reads. Logs are then used to confirm that maintenance is repeatedly running on hot tables and correlating with the throughput collapse.

What role do WAL and checkpoints play in the slowdown?

WAL and checkpoints govern how changes are durably written and when dirty pages are flushed. The tuning makes WAL flushes and checkpoints happen far more often and with more expensive sync behavior. Settings like minimizing WAL writer flush delay, increasing checkpoint frequency, forcing frequent checkpoint flushes, and using the slowest sync method (open data sync) increase the cost of each commit cycle. The logs show checkpoint start/stop events occurring much more frequently than normal, aligning with further TPS drops.

How can indexes be effectively “disabled” without deleting them?

The planner chooses between index scans and sequential scans based on estimated costs. By increasing random page access cost (random_page_cost) and adjusting CPU index tuple cost, random access becomes unattractive in the cost model. Since index scans tend to access pages randomly while table scans are more sequential, the planner is pushed toward full table scans—slower under the cache-starved conditions—reducing TPS even though indexes remain present.

Why does forcing I/O into one thread matter when there are 100 connections?

Even with many connections, Postgres uses multiple processes/threads for query execution. The described approach uses Postgres 18’s IO method knob to control how I/O calls are issued. By selecting an option that routes I/O through a single worker (effectively serializing disk operations), the workload can’t overlap reads/writes efficiently. With cache already starved and maintenance/WAL behavior made expensive, this serialization drives throughput to well under 0.1 TPS, with many transactions failing due to deadlocks.

Review Questions

Which tuning change first causes the biggest shift from memory-resident reads to disk reads, and how is that reflected in hit rate or read system calls?
Explain how autovacuum frequency and vacuum cost limits interact with a tiny shared buffers setting to amplify disk I/O.
What combination of planner cost changes and I/O serialization ultimately pushes the system from “hundreds of TPS” to “well under 0.1 TPS”?

Key Points

1
Start with a measurable baseline (TPC-C via benchbase, 128 warehouses, 100 connections) to quantify how far performance can be pushed in either direction.
2
Starving shared buffers forces page misses, increasing read system calls and collapsing TPS even before maintenance or WAL changes.
3
Making autovacuum/autoanalyze run almost continuously can dominate workload time when cache hit rates are already low.
4
Aggressive WAL flush and checkpoint tuning increases commit and durability overhead, with logs showing frequent checkpoint cycles.
5
Index usage can be discouraged without dropping indexes by raising random_page_cost and related planner costs so sequential scans become cheaper.
6
Postgres 18’s IO method controls can serialize I/O, preventing overlap and driving throughput toward near-zero under disk-heavy conditions.
7
Even when some transactions complete, deadlocks can rise as the system becomes overloaded and maintenance/write behavior worsens.

Highlights

A default-tuned Postgres run lands around 7,082 TPS, but configuration-only changes drive it to well under 0.1 TPS—about 42,000× slower.

Shrinking shared buffers sharply reduces cache hit rate and triggers a surge in read system calls, turning query execution into a disk-bound process.

Frequent autovacuum/autoanalyze plus expensive WAL/checkpoint flushing compounds into a system that spends much of its time on maintenance and durability work.

Indexes aren’t removed; planner cost knobs push Postgres toward sequential scans, which are slower when caching is crippled.

Using Postgres 18’s IO method to force single-threaded I/O is the final step that pushes the workload into near-total collapse.

Topics

Postgres Tuning
TPC-C Benchmark
WAL Checkpoints
Autovacuum
I/O Serialization

Mentioned

TPS
TPC-3
TPC-C
WAL
JSON
CPU