Probability Theory 28 | Weak Law of Large Numbers [dark version]
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
The weak law of large numbers links theoretical probability μ to observed relative frequency via the empirical average X̄_n.
Briefing
The weak law of large numbers formalizes a simple but powerful intuition: when independent, identically distributed random outcomes are sampled many times, the observed relative frequency of an event settles near its theoretical probability. In the coin-toss example, the fraction of heads among the first n tosses becomes increasingly close to 1/2 as n grows. The key question becomes what “close” and “settles” mean mathematically, and the weak law answers it using convergence in probability.
Relative frequency is built from random variables. For each toss k, define X_k to be 1 if toss k lands heads and 0 otherwise. After n tosses, the empirical probability of heads is the average X̄_n = (1/n)∑_{k=1}^n X_k. Since each X_k has expected value μ (for a fair coin, μ = 1/2), the weak law targets the event that the empirical average deviates from μ by more than a chosen tolerance ε > 0. Convergence in probability to μ means that P(|X̄_n − μ| > ε) → 0 as n → ∞ for every ε > 0.
To make this guarantee, the random variables must satisfy three main conditions. First, they are independent: any finite collection has joint probabilities that factor as products. Second, they are identically distributed: each X_k has the same distribution, so expectations match across k. Together these are summarized as an IID assumption. Third, expectations must exist in a way that supports the probability bound; the transcript uses the requirement that E|X_1| is finite (integrability), ensuring μ is well-defined.
A clean proof route appears when the variables also have finite variance. Let Var(X_1) = Σ^2. The expected value of the average stays fixed: E[X̄_n] = μ. More importantly, the variance shrinks with sample size: Var(X̄_n) = Σ^2/n. This scaling is the engine behind the result. Chebyshev’s inequality then bounds the deviation probability: P(|X̄_n − μ| > ε) ≤ Var(X̄_n)/ε^2 = (Σ^2/n)/ε^2. Because Σ^2 and ε^2 are constants, the right-hand side goes to zero as n increases, forcing the left-hand side to vanish as well. That is exactly convergence in probability, which is the weak law’s formal meaning.
The practical message is that empirical frequencies become reliable without requiring almost-sure convergence. Under IID sampling and finite variance, the chance of a noticeable mismatch between observed frequency and theoretical probability becomes arbitrarily small as the number of trials grows—setting the probabilistic foundation for statistical estimation and long-run frequency reasoning.
Cornell Notes
The weak law of large numbers connects theoretical probability to observed relative frequency. For IID random variables X_1, X_2, … with common mean μ, the empirical average X̄_n = (1/n)∑_{k=1}^n X_k approaches μ in probability. “In probability” means that for every ε > 0, the probability of a deviation larger than ε goes to zero: P(|X̄_n − μ| > ε) → 0 as n → ∞. When the variables have finite variance Σ^2, the proof becomes direct: Var(X̄_n) = Σ^2/n, and Chebyshev’s inequality yields P(|X̄_n − μ| > ε) ≤ Σ^2/(nε^2). As n grows, this bound shrinks to zero, making the empirical frequency increasingly accurate.
How is empirical probability (relative frequency) represented using random variables?
What does “convergence in probability” mean in the weak law of large numbers?
Why are the IID assumptions central to the weak law?
What role does finite variance play in the proof?
How does Chebyshev’s inequality produce the weak law’s probability bound?
Review Questions
- In the coin-toss setup, what are X_k and X̄_n, and what value does X̄_n aim to approach?
- State the formal definition of convergence in probability used in the weak law.
- Under what additional condition does the transcript give a Chebyshev-based proof, and what bound does it yield?
Key Points
- 1
The weak law of large numbers links theoretical probability μ to observed relative frequency via the empirical average X̄_n.
- 2
Empirical probability is modeled as X̄_n = (1/n)∑_{k=1}^n X_k, where X_k indicates whether the event occurred on trial k.
- 3
Convergence in probability means P(|X̄_n − μ| > ε) → 0 for every ε > 0.
- 4
The IID assumptions (independence and identical distribution) ensure consistent mean behavior and enable variance calculations.
- 5
Integrability (e.g., E|X_1| < ∞) is used to guarantee the mean μ is well-defined.
- 6
With finite variance Σ^2, Var(X̄_n) = Σ^2/n, so deviations become less likely as n grows.
- 7
Chebyshev’s inequality turns the variance scaling into the weak law’s probability bound: P(|X̄_n − μ| > ε) ≤ Σ^2/(nε^2).