The medical test paradox, and redesigning Bayes' rule

TL;DR

High sensitivity and specificity do not directly translate into a high probability of disease after a positive result when prevalence is low.

Briefing Cornell Notes

Briefing

An accurate medical test can still produce a surprisingly low chance that a positive result is truly correct—because disease prevalence and the test’s false positives reshape what “accuracy” means for an individual. Using a breast-cancer screening example with 1% prevalence, 90% sensitivity, and 91% specificity, the test correctly flags 9 out of 10 women with cancer but also produces 89 false positives among 990 women without cancer. When a woman tests positive, the probability she actually has cancer becomes 9/(9+89) ≈ 1 in 11. The paradox is that the test is “over 90% accurate” in the usual sense, yet its positive predictive value (PPV) can be arbitrarily low when the disease is rare.

The counterintuitive gap shows up in real-world reasoning. In 2006–2007, psychologist Gerd Gigerenzer ran statistics seminars for practicing gynecologists using the same numbers as the breast-cancer scenario: prevalence around 1%, sensitivity 90%, specificity 91%. Many doctors answered that a positive result implies something like a 9-in-10 chance of cancer—far off the correct 1-in-11. The mismatch isn’t a logical contradiction; it’s a “veridical paradox,” meaning the facts are provably true but feel wrong when people treat test accuracy as if it directly converts into personal risk.

The fix starts with reframing what tests do. Rather than delivering a final probability, a test updates a prior—the baseline chance of disease before seeing results. In the example, the prior is 1 in 100. The test doesn’t replace that with 90% certainty; it shifts it to about 1 in 11, roughly an order-of-magnitude change. That updating strength can be summarized by a Bayes factor (also called a likelihood ratio), computed for a positive result as sensitivity divided by the false positive rate. Here, 0.9 / 0.09 = 10. A practical rule follows: multiply the prior odds by the Bayes factor.

Odds make the mathematics feel less like a trap. Prior odds are the number with cancer divided by the number without it; after a positive test, those odds are scaled by the Bayes factor. With 1% prevalence, prior odds are 1:99; multiplying by 10 yields 10:99, which converts back to about 1 in 11. If prevalence rises to 10%, prior odds become 1:9; after multiplying by 10, the result is 10:9, or roughly 53%—matching what a concrete population count predicts. The same logic works for negative results too, using a different factor: false negative rate divided by specificity (about 1 in 9 in the example), which reduces prior odds by about an order of magnitude.

Finally, the discussion contrasts the odds-and-Bayes-factor version of Bayes’ rule with the more common probability form. The odds framing cleanly separates prior information from test accuracy, making it easier to swap priors and chain multiple pieces of evidence (like symptoms or multiple tests). The standard formula remains valuable as a compact representation of the sample-population counting method, but the odds approach reduces ambiguity—especially because a Bayes factor is not a probability and therefore can’t be mistaken for “the chance your result is false.”

Cornell Notes

The breast-cancer screening example shows why “high accuracy” doesn’t guarantee a high probability that a positive result is correct. With 1% prevalence, sensitivity 90%, and specificity 91%, a positive test yields only about a 1 in 11 chance of actually having cancer (PPV = 9/(9+89)). Many clinicians misread sensitivity/specificity as if they directly translate into personal risk, producing answers like 9 in 10. The remedy is to treat testing as Bayesian updating: start with a prior (pre-test risk) and apply a Bayes factor. For a positive result, the Bayes factor equals sensitivity divided by the false positive rate; update prior odds by multiplying by this factor. This framing also clarifies negative results and makes multi-evidence updates more straightforward.

Why does a test with 90% sensitivity and 91% specificity still give only ~1/11 chance of cancer after a positive result?

Because prevalence is low. In a population of 1,000 women with 1% prevalence, about 10 have cancer and 990 do not. With 90% sensitivity, 9 of the 10 cancer cases test positive (true positives). With 91% specificity, 9% of the 990 without cancer test positive as false positives: 0.09×990 ≈ 89. A positive result therefore corresponds to 9 true positives out of 9+89 total positives, so PPV = 9/(9+89) ≈ 1/11.

What misconception did many doctors show in Gigerenzer’s seminars?

They treated sensitivity/specificity as if they directly determine the probability of disease given the test result. When asked “how many women who test positive actually have breast cancer?” many answered around 9 in 10, ignoring that most positive results come from false positives when prevalence is only ~1%. The correct answer is closer to 1 in 11 under the stated prevalence, sensitivity, and specificity.

How does the Bayes factor help turn test accuracy into an update of personal risk?

For a positive test, the Bayes factor (likelihood ratio) is sensitivity divided by the false positive rate. In the example, sensitivity = 0.9 and false positive rate = 1−specificity = 0.09, so Bayes factor = 0.9/0.09 = 10. This number measures how much more likely a positive result is among people with the disease versus without it, and it acts on the prior odds rather than being confused with a probability.

How do odds-based updates reproduce the breast-cancer numbers without heavy calculation?

Express the prior as odds: prior odds = (cancer cases)/(non-cancer cases). With 1% prevalence, prior odds are 1:99. Multiply by the Bayes factor 10 to get posterior odds 10:99. Converting back to probability gives 10/(10+99) ≈ 1/11. With 10% prevalence, prior odds are 1:9; multiplying by 10 gives 10:9, which corresponds to about 10/(10+9) ≈ 53%.

How does the logic change for a negative test result?

The update uses a different Bayes factor based on how negative results differ between diseased and non-diseased groups. For a negative test, the base factor is false negative rate divided by specificity. In the example, false negative rate = 10% and specificity = 91%, giving about 0.10/0.91 ≈ 1/9. Seeing a negative result therefore reduces prior odds by roughly an order of magnitude.

What practical advantage does the odds-and-Bayes-factor framing offer over the usual probability form of Bayes’ rule?

It separates prior information from test accuracy. Once the Bayes factor is computed, swapping priors is easy: you just multiply prior odds by the same factor. It also supports chaining evidence: each new piece of evidence contributes its own Bayes factor, and the odds update by multiplying by each factor in sequence.

Review Questions

In the breast-cancer example (1% prevalence, 90% sensitivity, 91% specificity), compute the number of true positives and false positives in a group of 1,000 women and derive the PPV.
What is the Bayes factor for a positive test in the example, and how does it relate to sensitivity and the false positive rate?
Why do odds-based updates work cleanly for both low and higher prevalence, while a naive “accuracy equals probability” interpretation fails?

Key Points

1
High sensitivity and specificity do not directly translate into a high probability of disease after a positive result when prevalence is low.
2
Positive predictive value (PPV) depends on the balance between true positives and false positives, which prevalence strongly controls.
3
Treat test results as Bayesian updates: start with a prior probability (or prior odds) and then apply evidence from the test.
4
For a positive test, the Bayes factor equals sensitivity divided by the false positive rate; update prior odds by multiplying by this factor.
5
Odds framing makes Bayes’ rule easier to apply and reduces confusion because a Bayes factor is not a probability.
6
The same updating logic applies to negative results, using a Bayes factor based on false negative rate and specificity.
7
The odds-and-Bayes-factor form makes it simpler to swap priors and combine multiple evidence sources (e.g., symptoms or multiple tests).

Highlights

With 1% prevalence, 90% sensitivity, and 91% specificity, a positive result corresponds to about 9 true positives versus about 89 false positives—yielding PPV ≈ 1 in 11.

Gigerenzer’s seminars found that many clinicians answered around 9 in 10, a common error caused by treating sensitivity/specificity as if they directly determine P(disease | positive).

A positive-test Bayes factor is sensitivity / false-positive-rate; in the example it equals 10, which updates prior odds by multiplication.

Odds-based Bayes updating reproduces both the rare-disease case (~1/11) and the higher-prevalence case (~53%) using the same Bayes factor.

A Bayes factor is designed to act on prior odds; it’s not a probability, which helps prevent misinterpretation of test accuracy statistics.

Topics

Medical Test Paradox
Bayes' Rule
Positive Predictive Value
Bayes Factor
Odds vs Probability

Mentioned

Gerd Gigerenzer
PPV