Get AI summaries of any video or article — Sign up free
But what is the Central Limit Theorem? thumbnail

But what is the Central Limit Theorem?

3Blue1Brown·
5 min read

Based on 3Blue1Brown's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Bell-curve behavior emerges when many independent samples are added, after centering by the mean and scaling by the standard deviation.

Briefing

A single, chaotic process can be unpredictable ball-by-ball, yet the totals across many repetitions settle into a remarkably stable pattern: the bell curve. That stability is the core promise of the central limit theorem—no matter what distribution a single roll comes from (even a skewed one), the properly centered and rescaled sum of many independent samples approaches a universal normal distribution. This matters because it turns messy randomness into predictable probabilities, letting people compute ranges like “where a sum will land” with only a mean and a standard deviation.

The lesson builds the idea from a Galton board. In an idealized version, each bounce is a 50-50 choice that nudges a ball left or right by +1 or −1. After many such random choices, the final bucket corresponds to the sum of those ±1 steps. Repeating this for many balls yields a distribution of sums. With only a few steps (few rows), the shape is rough and can resemble the original asymmetries of the underlying randomness. As the number of steps grows, the distribution becomes smoother and more symmetric, converging toward a bell curve.

To generalize beyond pegs and bounces, the discussion shifts to random variables and sums. Take a random variable X (for example, the outcome of a die roll) and form a new variable by adding n independent copies of X. The central limit theorem claims that as n increases, the distribution of that sum approaches a normal distribution after centering by the mean and scaling by the standard deviation. The key quantitative ingredients are the mean and variance: the mean of a sum scales linearly with n, while the variance of a sum adds across independent variables. That means the standard deviation grows like the square root of n, so the distribution spreads out, but not as fast as the mean shifts.

The normal distribution’s exact formula is then constructed. Its shape comes from an exponential of a negative square, e^(−x^2/2), and its normalization constant is chosen so the total area under the curve equals 1. That requirement introduces the factor involving π, yielding the standard normal density: (1/√(2π))·e^(−x^2/2). From there, any normal distribution is obtained by shifting by μ (the mean) and stretching by σ (the standard deviation).

The practical payoff is illustrated with a concrete problem: rolling a fair die 100 times and finding a range that captures 95% of outcomes. The mean of one die is 3.5 and the standard deviation is about 1.71 (computed from variance 2.92). For 100 rolls, the mean becomes 350 and the standard deviation becomes 17.1. Using the common “68–95–99.7” rule, the 95% interval corresponds to roughly mean ± 2 standard deviations, giving about 316 to 384.

Finally, the theorem’s assumptions are laid out: the summed variables must be independent, identically distributed (IID), and have finite variance. Independence fails on a real Galton board because one bounce affects the next, and identical distribution can also fail because different pegs may bias outcomes. Still, broader generalizations can sometimes rescue the bell-curve behavior. The warning is that assuming normality without justification is common—and often wrong—especially when variance is infinite, where convergence to a normal distribution may fail.

Cornell Notes

The central limit theorem explains why bell curves appear so often: add up many independent samples from a single random variable, then center by the mean and scale by the standard deviation, and the resulting distribution approaches a universal normal (Gaussian) shape. The mean of the sum grows proportionally to n, while the variance adds, so the standard deviation grows like √n. The normal distribution’s density is (1/√(2π))·e^(−x^2/2), with π entering through the normalization that forces the total area under the curve to equal 1. This lets people compute probability ranges—like an approximate 95% interval using mean ± 2σ—using only the underlying mean and variance.

Why does adding many random outcomes produce a bell curve even when the original distribution is skewed?

Because the sum’s distribution becomes dominated by aggregate behavior after centering and scaling. The lesson demonstrates this with dice: start with a weighted die that favors low values, then repeatedly sum 2, 5, 10, and 15 rolls. Early sums still look skewed or spiky, but as the number of summed rolls grows, the shape becomes increasingly symmetric and bell-like. Even when the single-roll distribution is highly non-uniform (e.g., most probability on 1 and 6), the normalized sums still converge toward the same standard normal curve.

What exactly changes when you sum n copies of a random variable—mean or spread?

Both shift, but at different rates. If the underlying random variable has mean μ and standard deviation σ, then the sum of n independent copies has mean nμ. Variance adds across independent variables, so the sum’s variance is nσ^2, making the standard deviation σ√n. That square-root growth is why the distribution spreads out more slowly than the mean marches rightward.

How is the normal distribution formula built, and where does π come from?

The bell shape comes from an exponential of a negative square: e^(−x^2/2). To turn that shape into a probability density, the total area under the curve must equal 1. The integral of the unnormalized e^(−x^2/2) involves √(2π), so the normalization constant becomes 1/√(2π). That is the origin of π in the standard normal density: (1/√(2π))·e^(−x^2/2).

How does the central limit theorem translate into a 95% range for sums like “100 die rolls”?

Compute the mean and standard deviation of the sum, then use the normal-distribution rule of thumb. For a fair die, the mean is 3.5 and the standard deviation is about 1.71. For 100 rolls, the mean is 100·3.5 = 350 and the standard deviation is √100·1.71 = 17.1. The approximate 95% interval uses mean ± 2σ, giving 350 ± 34.2, or roughly 316 to 384.

What do the assumptions IID and finite variance mean in this context?

IID means independent and identically distributed: each added variable must not influence the others, and each must come from the same underlying probability distribution. The lesson notes that a real Galton board violates both: one bounce changes the trajectory for the next, and different pegs can bias left/right outcomes differently. Finite variance means the variance of the underlying variable must exist (not diverge). If variance is infinite, even with independence and identical distribution, the limit may not be normal.

Review Questions

  1. If X has mean μ and variance σ^2, what are the mean and variance of the sum of n independent copies of X?
  2. Why does the standard deviation of a sum scale like √n rather than n?
  3. What goes wrong with the central limit theorem if the underlying variable has infinite variance?

Key Points

  1. 1

    Bell-curve behavior emerges when many independent samples are added, after centering by the mean and scaling by the standard deviation.

  2. 2

    For sums of independent variables, variance adds; standard deviation grows like the square root of the number of terms.

  3. 3

    The standard normal density is (1/√(2π))·e^(−x^2/2), with π appearing from the normalization that forces total area to equal 1.

  4. 4

    A practical 95% interval for large sums is often approximated by mean ± 2 standard deviations using the 68–95–99.7 rule.

  5. 5

    The central limit theorem relies on independence, identical distribution (IID), and finite variance; violating these can break normal convergence.

  6. 6

    Real systems like a Galton board may violate IID assumptions, yet bell-curve behavior can still arise under broader generalizations—though normality should not be assumed blindly.

Highlights

Even a highly skewed single-roll distribution can yield a symmetric bell curve once many rolls are summed and normalized.
Variance adds across independent terms, so spread grows as √n—slower than the mean’s linear drift.
The factor 1/√(2π) is not cosmetic: it’s required so the normal curve’s total area equals 1.
For 100 fair die rolls, the approximate 95% range is about 316 to 384 using mean ± 2σ.
The theorem’s assumptions—IID and finite variance—are essential; infinite variance can prevent convergence to a normal distribution.

Topics

Mentioned

  • IID