Probability Theory 30 | Strong Law of Large Numbers [dark version]

Q: What does “almost sure convergence” mean for the averages X̄n?

Almost sure convergence means that for a given outcome ω, the sequence of averages X̄n(ω) approaches μ as n → ∞, except possibly on a set of outcomes with probability zero. Formally, the “bad” set {ω : X̄n(ω) does not converge to μ} has probability 0, so the complement has probability 1.

Q: How does the weak law differ from the strong law in what it guarantees?

The weak law controls deviations in probability: for any ε > 0, P(|X̄n − μ| ≥ ε) → 0. That makes large deviations unlikely for large n, but it doesn’t rule out the possibility that along a particular realized sequence, the average could still be outside the ε-neighborhood at infinitely many n. The strong law removes that ambiguity by ensuring convergence for almost every ω.

TL;DR

For IID random variables X1, X2, … with finite expectation μ, the sample average X̄n = (1/n)∑k=1^n Xk converges to μ as n → ∞.

Briefing Cornell Notes

Briefing

The strong law of large numbers upgrades the usual “averages settle down” message by guaranteeing point-by-point convergence for repeated random experiments—except on a set of outcomes with probability zero. Starting from independent, identically distributed (IID) random variables X1, X2, … with a finite expectation μ (the transcript assumes integrability, e.g., E|Xk| < ∞), the running average X̄n = (1/n)∑k=1^n Xk converges to μ as n → ∞. The key difference from the weak law is the type of convergence: the strong law delivers almost sure convergence, meaning the average converges for almost every outcome ω in the sample space.

Under the weak law, the statement is probabilistic and set-based: for any ε > 0, the probability that the average deviates from μ by at least ε goes to zero as n grows. In symbols, P(|X̄n − μ| ≥ ε) → 0. That ensures large deviations become unlikely, but it doesn’t rule out the possibility that—at some specific times n—an average could still wander far from μ. The transcript illustrates this with a fixed outcome ω and a “timeline” view: weak convergence controls how often deviations happen across the probability space, not whether deviations might keep occurring along a particular realized sequence.

The strong law addresses exactly that gap. It asserts that X̄n(ω) → μ for almost all ω, i.e., the set of ω where convergence fails has probability one complement: it is a null set. In measure-theoretic terms, the probability of the “bad” outcomes is exactly zero. This is why the transcript calls the convergence “almost sure” (also described as convergence with probability one). It also notes a hierarchy of convergence: almost sure convergence is stronger than convergence in probability. As a result, proving the strong law automatically implies the weak law under the same IID and integrability assumptions.

The discussion also clarifies why both laws are distinguished at all. Different versions of the law of large numbers can be formulated under different assumptions, and in some settings one may obtain only the weak law without the strong law. But for the standard IID case with an existing expectation, the transcript emphasizes there’s no such separation: the strong law holds, and therefore the weak law follows as well.

In practical terms, the strong law provides the most reassuring interpretation of long-run averages: once a random experiment is repeated indefinitely, the realized sample average will settle to the expected value μ for essentially every realized sequence, with exceptions so rare they carry zero probability. The transcript closes by pointing toward the next topic—limit theorems—building on this foundation of how averages behave as sample size grows.

Cornell Notes

With IID random variables X1, X2, … and finite expectation μ (assuming integrability such as E|Xk| < ∞), the running average X̄n = (1/n)∑k=1^n Xk converges to μ as n → ∞. The strong law strengthens the weak law by giving pointwise convergence along realized outcomes: X̄n(ω) → μ for almost every ω in the sample space. “Almost sure” means the set of outcomes where convergence fails has probability zero. Because almost sure convergence implies convergence in probability, the strong law automatically yields the weak law under the same assumptions. This matters because weak convergence alone doesn’t prevent deviations from occurring at infinitely many sample sizes for a particular realized sequence.

What does “almost sure convergence” mean for the averages X̄n?

Almost sure convergence means that for a given outcome ω, the sequence of averages X̄n(ω) approaches μ as n → ∞, except possibly on a set of outcomes with probability zero. Formally, the “bad” set {ω : X̄n(ω) does not converge to μ} has probability 0, so the complement has probability 1.

How does the weak law differ from the strong law in what it guarantees?

The weak law controls deviations in probability: for any ε > 0, P(|X̄n − μ| ≥ ε) → 0. That makes large deviations unlikely for large n, but it doesn’t rule out the possibility that along a particular realized sequence, the average could still be outside the ε-neighborhood at infinitely many n. The strong law removes that ambiguity by ensuring convergence for almost every ω.

Why does the strong law imply the weak law?

Almost sure convergence is stronger than convergence in probability. So once it’s known that X̄n → μ almost surely, it automatically follows that for every ε > 0, the probability of being at least ε away from μ goes to zero: P(|X̄n − μ| ≥ ε) → 0.

What assumptions are used to get the strong law in this transcript?

The transcript uses IID random variables: independence and identical distribution across X1, X2, …. It also assumes integrability so the expectation exists, described as finite E|Xk| (or at least that the expectation for each random variable exists). Under these conditions, the strong law holds.

Why is it still useful to distinguish strong and weak laws even though IID gives both?

The transcript notes that different formulations of the law of large numbers can use different assumptions. In some settings, only the weak law may be true while the strong law fails. The distinction matters when assumptions are weakened or changed; for the standard IID + expectation case, both hold.

Review Questions

In what sense does the strong law guarantee convergence for a fixed outcome ω, and what is the probability of the exceptions?
State the weak law’s deviation condition using ε and explain why it doesn’t guarantee pointwise convergence.
Explain the relationship between almost sure convergence and convergence in probability, and how that relationship connects the strong and weak laws.

Key Points

1
For IID random variables X1, X2, … with finite expectation μ, the sample average X̄n = (1/n)∑k=1^n Xk converges to μ as n → ∞.
2
The weak law gives P(|X̄n − μ| ≥ ε) → 0 for every ε > 0, which controls deviations probabilistically.
3
The strong law upgrades this to pointwise convergence: X̄n(ω) → μ for almost every outcome ω.
4
“Almost surely” means the set of outcomes where convergence fails has probability zero.
5
Almost sure convergence is stronger than convergence in probability, so the strong law implies the weak law under the same assumptions.
6
The strong and weak laws can differ under alternative assumptions, but for the standard IID + integrability setup, both hold together.

Highlights

The strong law ensures the realized average settles to μ for almost every outcome ω, not just with high probability.

Weak law controls the chance of being far from μ at large n, but it doesn’t prevent infinitely many deviations along a single realized sequence.

Almost sure convergence (probability 1) automatically implies convergence in probability, linking the strong and weak laws.

Under IID and integrability (finite expectation), the strong law holds—and therefore the weak law follows as a consequence.

Topics

Strong Law of Large Numbers
Almost Sure Convergence
Weak Law of Large Numbers
Convergence in Probability
IID Random Variables