Probability Theory 30 | Strong Law of Large Numbers

TL;DR

The weak law of large numbers gives convergence in probability: P(|X̄_n − μ| ≥ ε) → 0 for every ε > 0.

Briefing Cornell Notes

Briefing

The strong law of large numbers upgrades the usual “average settles down” message by guaranteeing point-by-point convergence of sample averages—not just that large deviations become unlikely. For independent, identically distributed (IID) random variables with a finite expected value, the running average converges to the common mean almost surely, meaning it happens for all outcomes except a probability-zero set. This matters because it matches how repeated random experiments behave when the underlying outcome ω is fixed: the long-run average should stabilize for almost every ω, not merely for most ω in aggregate.

The starting point is the weak law of large numbers, which uses convergence in probability. Under the weak law, for any ε > 0, the probability that the sample average differs from the mean by at least ε goes to zero as the number of samples n grows. Formally, that means the probability measure of the set of outcomes where |X̄_n(ω) − μ| ≥ ε shrinks to nothing as n → ∞. But this framework doesn’t guarantee what happens at a specific ω over time. A fixed ω can still exhibit “bad” behavior at infinitely many n values—even if the set of ω that are bad at a given n becomes small—so the weak law alone can’t rule out persistent outliers across different sample sizes.

The strong law addresses that gap by switching from probability-based statements about sets of outcomes to an almost sure statement about the limit itself. With an IID sequence (X_k) and an integrability condition (the expectation of |X_k| is finite, so the mean exists), the sample average X̄_n(ω) converges to μ as n → ∞ for almost every ω. “Almost surely” is made precise by probability: the set of outcomes where convergence fails has probability exactly one complement, i.e., probability zero. So while exceptions are logically possible, they are confined to a null set.

A key relationship links the two laws: almost sure convergence is stronger than convergence in probability. Consequently, once the strong law holds under the stated IID and integrability assumptions, the weak law follows automatically. That’s why the strong law is often treated as the more informative result: it delivers the stabilization of averages for essentially every realized sequence of outcomes, not just in the sense of shrinking deviation probabilities.

Finally, the discussion notes that the assumptions can be adjusted in other formulations of the strong law, but the common practical version remains centered on IID random variables with finite expectation. Under those conditions, both the strong and weak law of large numbers are available together, providing a robust foundation for why averages of random quantities become reliable in the long run. The next step is to move toward limit theorems beyond the law of large numbers.

Cornell Notes

For IID random variables with a finite mean μ, the sample average X̄_n converges to μ as n → ∞ in an almost sure sense. “Almost surely” means the set of outcomes ω where convergence fails has probability 0, so the stabilization happens for essentially every realized sequence. This is stronger than the weak law, which only guarantees convergence in probability: for any ε > 0, the probability that |X̄_n − μ| ≥ ε goes to 0 as n grows. Because almost sure convergence implies convergence in probability, the strong law automatically yields the weak law under the same IID and integrability assumptions. The distinction matters because the weak law alone doesn’t rule out infinitely many “bad” sample sizes along a fixed ω.

What exactly does the weak law guarantee, and what does it fail to guarantee for a fixed outcome ω?

The weak law guarantees convergence in probability: for any ε > 0, P(|X̄_n(ω) − μ| ≥ ε) → 0 as n → ∞. That controls the probability of being far from μ at each n, but it doesn’t ensure pointwise convergence along a single fixed ω. A fixed ω could still have infinitely many n where the average lies outside the ε-neighborhood of μ; the weak law only says such “bad” behavior occurs for a vanishing fraction of outcomes overall, not that it eventually stops for each ω.

How does the strong law strengthen the conclusion?

The strong law guarantees almost sure convergence: X̄_n(ω) → μ as n → ∞ for almost every ω. More precisely, the set of ω where convergence fails has probability 0. This turns the stabilization claim into a pointwise limit statement rather than a statement about probabilities of deviation sets.

What assumptions are used for the strong law in this presentation?

The sequence (X_k) is assumed IID (independent and identically distributed), and the random variables are integrable—specifically, E|X_k| is finite, which ensures the mean μ exists. The discussion notes that other formulations can relax or change requirements, but this common version uses IID plus finite expectation.

Why does proving almost sure convergence automatically give convergence in probability?

Almost sure convergence is stronger than convergence in probability. If X̄_n(ω) → μ for all ω except a probability-zero set, then the probability that |X̄_n − μ| stays at least ε must also go to 0 for any ε > 0. Therefore, once the strong law holds, the weak law follows as a corollary under the same assumptions.

What does “almost surely” mean in terms of probability measure?

“Almost surely” means the probability measure of the set of outcomes where the desired convergence does not occur is zero. Equivalently, the set where convergence holds has probability one. The transcript emphasizes that exceptions can exist but must lie in a null set.

Review Questions

In what sense does convergence in probability differ from almost sure convergence for the sequence of sample averages X̄_n(ω)?
Why can the weak law still allow infinitely many deviations along a fixed ω, even though P(|X̄_n − μ| ≥ ε) → 0?
Under what conditions on an IID sequence does the strong law guarantee X̄_n(ω) → μ almost surely?

Key Points

1
The weak law of large numbers gives convergence in probability: P(|X̄_n − μ| ≥ ε) → 0 for every ε > 0.
2
Convergence in probability does not ensure that a fixed outcome ω eventually stays within any ε-neighborhood of μ.
3
The strong law of large numbers gives pointwise stabilization: X̄_n(ω) → μ as n → ∞ for almost every ω.
4
“Almost surely” means the set of outcomes where convergence fails has probability 0 (probability one for the good set).
5
For IID random variables with finite expectation (integrability), the strong law holds and therefore the weak law follows automatically.
6
Almost sure convergence is strictly stronger than convergence in probability, which is why the strong law implies the weak law.

Highlights

The strong law turns the “average settles down” idea into a limit statement that holds for almost every realized sequence of outcomes.

Weak law control is probabilistic at each n; strong law control is about what happens as n grows along ω.

Almost sure convergence means failures are confined to a probability-zero set, not just “rare” events.

Topics

Strong Law of Large Numbers
Weak Law of Large Numbers
Almost Sure Convergence
Convergence in Probability
IID Random Variables