Probability Theory 30 | Strong Law of Large Numbers [dark version]
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
For IID random variables X1, X2, … with finite expectation μ, the sample average X̄n = (1/n)∑k=1^n Xk converges to μ as n → ∞.
Briefing
The strong law of large numbers upgrades the usual “averages settle down” message by guaranteeing point-by-point convergence for repeated random experiments—except on a set of outcomes with probability zero. Starting from independent, identically distributed (IID) random variables X1, X2, … with a finite expectation μ (the transcript assumes integrability, e.g., E|Xk| < ∞), the running average X̄n = (1/n)∑k=1^n Xk converges to μ as n → ∞. The key difference from the weak law is the type of convergence: the strong law delivers almost sure convergence, meaning the average converges for almost every outcome ω in the sample space.
Under the weak law, the statement is probabilistic and set-based: for any ε > 0, the probability that the average deviates from μ by at least ε goes to zero as n grows. In symbols, P(|X̄n − μ| ≥ ε) → 0. That ensures large deviations become unlikely, but it doesn’t rule out the possibility that—at some specific times n—an average could still wander far from μ. The transcript illustrates this with a fixed outcome ω and a “timeline” view: weak convergence controls how often deviations happen across the probability space, not whether deviations might keep occurring along a particular realized sequence.
The strong law addresses exactly that gap. It asserts that X̄n(ω) → μ for almost all ω, i.e., the set of ω where convergence fails has probability one complement: it is a null set. In measure-theoretic terms, the probability of the “bad” outcomes is exactly zero. This is why the transcript calls the convergence “almost sure” (also described as convergence with probability one). It also notes a hierarchy of convergence: almost sure convergence is stronger than convergence in probability. As a result, proving the strong law automatically implies the weak law under the same IID and integrability assumptions.
The discussion also clarifies why both laws are distinguished at all. Different versions of the law of large numbers can be formulated under different assumptions, and in some settings one may obtain only the weak law without the strong law. But for the standard IID case with an existing expectation, the transcript emphasizes there’s no such separation: the strong law holds, and therefore the weak law follows as well.
In practical terms, the strong law provides the most reassuring interpretation of long-run averages: once a random experiment is repeated indefinitely, the realized sample average will settle to the expected value μ for essentially every realized sequence, with exceptions so rare they carry zero probability. The transcript closes by pointing toward the next topic—limit theorems—building on this foundation of how averages behave as sample size grows.
Cornell Notes
With IID random variables X1, X2, … and finite expectation μ (assuming integrability such as E|Xk| < ∞), the running average X̄n = (1/n)∑k=1^n Xk converges to μ as n → ∞. The strong law strengthens the weak law by giving pointwise convergence along realized outcomes: X̄n(ω) → μ for almost every ω in the sample space. “Almost sure” means the set of outcomes where convergence fails has probability zero. Because almost sure convergence implies convergence in probability, the strong law automatically yields the weak law under the same assumptions. This matters because weak convergence alone doesn’t prevent deviations from occurring at infinitely many sample sizes for a particular realized sequence.
What does “almost sure convergence” mean for the averages X̄n?
How does the weak law differ from the strong law in what it guarantees?
Why does the strong law imply the weak law?
What assumptions are used to get the strong law in this transcript?
Why is it still useful to distinguish strong and weak laws even though IID gives both?
Review Questions
- In what sense does the strong law guarantee convergence for a fixed outcome ω, and what is the probability of the exceptions?
- State the weak law’s deviation condition using ε and explain why it doesn’t guarantee pointwise convergence.
- Explain the relationship between almost sure convergence and convergence in probability, and how that relationship connects the strong and weak laws.
Key Points
- 1
For IID random variables X1, X2, … with finite expectation μ, the sample average X̄n = (1/n)∑k=1^n Xk converges to μ as n → ∞.
- 2
The weak law gives P(|X̄n − μ| ≥ ε) → 0 for every ε > 0, which controls deviations probabilistically.
- 3
The strong law upgrades this to pointwise convergence: X̄n(ω) → μ for almost every outcome ω.
- 4
“Almost surely” means the set of outcomes where convergence fails has probability zero.
- 5
Almost sure convergence is stronger than convergence in probability, so the strong law implies the weak law under the same assumptions.
- 6
The strong and weak laws can differ under alternative assumptions, but for the standard IID + integrability setup, both hold together.