Correlation CAN Imply Causation! | Statistics Misconceptions
Based on minutephysics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Correlation by itself does not determine direction of causality; hidden variables can generate the same pattern.
Briefing
Correlation can’t automatically tell you what causes what—but correlations can still pin down causality when they’re used to test causal models. The common mistake is to treat any observed relationship between two variables as evidence that one variable produces the other. The cat-and-height example makes the point: if taller people own more cats, that pattern could mean height causes cat ownership, cat ownership causes height, or a third factor drives both—like two islands where one is lush enough to support both taller growth and pet cats, while the other limits both.
That leads to a second misconception: that statistics can’t infer causality at all. In reality, correlation is often informative because it constrains which causal explanations are even consistent with the data. The transcript illustrates this with a causal-network approach. Start with a situation where height and cat ownership are correlated, but nothing else is known. Under broad assumptions, there are many possible causal structures that could generate the same correlation—19 different relationships if you treat the correlation as potentially non-accidental.
Then add targeted background information that rules out entire classes of causal models. First, assume people born on a given island stay there, so height cannot influence island choice. That eliminates causal structures where height determines which island someone lives on. Second, assume that within each island taken alone, there’s no correlation between height and cat ownership. That removes models where height and cat ownership directly influence each other on the same island.
After those constraints, the number of viable explanations collapses from 19 to just two. Either the island environment is the shared cause of both height and cat ownership (a lush island supports both taller people and more cats), or cat ownership is the upstream driver that changes the island conditions, which then affect height for future residents. The key message is methodological: correlations don’t “imply” causation by themselves, but they can imply causation once they’re used to eliminate causal models that would contradict additional correlation patterns and known causal directions.
The transcript also flags an important exception. In most classical statistical settings, using correlations to evaluate causal networks can narrow causal possibilities substantially. But some experiments in quantum mechanics produce correlations that can rule out all possible cause-and-effect explanations. The takeaway is a revised rule of thumb: correlation doesn’t necessarily imply causation, yet it can when it’s used to test causal models—except in quantum mechanics, where the usual causal assumptions break down. The discussion ends by pointing to follow-up material on statistics and causality, including footnotes that mention feedback loops and correlations arising purely by chance.
Cornell Notes
Correlation alone doesn’t determine causation, as illustrated by the cat-and-height example where a third factor (like island environment) could drive both variables. Still, correlations can imply causation when they’re used to evaluate causal models within a causal network framework. With only one observed correlation, many causal structures can fit the data (19 possibilities). Adding background constraints—such as island assignment being independent of height and no within-island correlation between height and cat ownership—eliminates most models and leaves only two consistent explanations. The transcript notes a major caveat: quantum mechanics can generate correlations that rule out all classical cause-and-effect models.
Why doesn’t a correlation between height and cat ownership automatically prove that one causes the other?
What does it mean to use correlations to “evaluate causal models” instead of treating correlation as direct proof?
How do the island assumptions reduce the number of possible causal explanations from 19 to 2?
What role do “correlations that are absent” play in causal inference?
Why is quantum mechanics mentioned as an exception to the usual correlation-to-causation logic?
Review Questions
- In the cat-and-height island scenario, list the three broad causal possibilities that could produce the same overall correlation.
- What two additional assumptions are used to eliminate most causal models, and what kinds of causal relationships do they rule out?
- How does the causal-network approach differ from the simplistic claim that correlation directly implies causation?
Key Points
- 1
Correlation by itself does not determine direction of causality; hidden variables can generate the same pattern.
- 2
Causal inference improves when correlations are used to test and eliminate candidate causal models within a causal network.
- 3
With only one observed correlation, many different causal structures can fit the data (19 possibilities in the example).
- 4
Background constraints—like fixing that island assignment is independent of height—can rule out entire classes of causal explanations.
- 5
Absence of correlation within subgroups (no within-island height–cat correlation) can eliminate direct causal links between the variables.
- 6
Using both correlations and non-correlations can shrink the set of viable causal explanations dramatically (from 19 to 2 in the example).
- 7
Quantum mechanics can produce correlations that rule out all classical cause-and-effect explanations, breaking the usual intuition.