Probability Theory 16 | Variance [dark version]

TL;DR

Variance measures spread by averaging squared deviations from the expectation: Var(X) = E[(X − E[X])²].

Briefing Cornell Notes

Briefing

Variance turns “how much a random variable fluctuates around its mean” into a precise number. After defining expectation as the average value a random variable centers around, variance measures the spread by looking at deviations from that expectation, squaring them to remove sign, and then averaging the squared deviations. The key requirement is that the expectation involved in the variance calculation must exist; otherwise the spread measure isn’t well-defined.

Formally, variance is built from the random variable’s deviation from its mean: start with X − E[X], note that this new quantity has expectation 0, then square it and take its expectation. That yields Var(X) = E[(X − E[X])²]. Expanding the square gives a more computationally convenient form: Var(X) = E[X²] − (E[X])². This identity follows from linearity of expectation and the fact that E[c] = c for constants (including E[1] = 1 under any probability measure). In practice, this means variance can be computed by finding the expected value of X² and subtracting the square of the expected value of X.

To compute variance without abstract measure-theory language, the transcript connects the definition to the two standard cases. For discrete random variables, expectations become sums over outcomes weighted by a probability mass function. For continuous random variables, expectations become integrals weighted by a probability density function. In both settings, the only change from the mean calculation is that X is replaced by X² inside the expectation.

A first worked example uses a discrete uniform distribution over n outcomes {X1, …, Xn}, where each outcome has probability 1/n. The expectation becomes the arithmetic mean X̄ = (1/n)∑_{j=1}^n Xj. Using the variance definition, the spread becomes Var(X) = (1/n)∑_{j=1}^n (Xj − X̄)²—an average of squared deviations from the mean, scaled by 1/n. This generalizes the familiar “fair die” intuition to any finite set of equally likely outcomes.

The second example switches to a continuous exponential distribution with rate parameter λ. From the earlier expectation result, E[X] = 1/λ. To get variance, the calculation needs E[X²], which becomes the integral ∫_0^∞ x² · (λ e^{-λx}) dx. Solving via integration by parts twice gives E[X²] = 2/λ². Plugging into Var(X) = E[X²] − (E[X])² yields Var(X) = 1/λ², the standard variance formula for the exponential distribution.

Overall, variance is presented as a mean-squared deviation: compute E[X²], subtract (E[X])², and interpret the result as the width of the distribution around its expectation. The next step promised is to study additional properties that make variance especially useful in later probability work.

Cornell Notes

Variance quantifies how widely a random variable spreads around its expectation. Starting from the deviation X − E[X], squaring removes negative deviations and then taking expectation averages the squared distance. The transcript derives the practical identity Var(X) = E[X²] − (E[X])² using linearity of expectation and the rule that the expectation of a constant equals the constant. Computing variance then reduces to calculating E[X²] in the appropriate setting: sums for discrete distributions and integrals for continuous ones. Two examples illustrate the method: a discrete uniform distribution gives Var(X) = (1/n)∑(Xj − X̄)², and an exponential distribution with rate λ yields Var(X) = 1/λ².

Why does variance start with X − E[X] and then square it?

Variance measures deviation from the mean, so it begins with X − E[X]. That deviation has expectation 0, meaning positive and negative deviations cancel out if you average them directly. Squaring (X − E[X])² makes all deviations nonnegative, so averaging no longer cancels; the result is the mean squared deviation from the expectation.

How does the identity Var(X) = E[X²] − (E[X])² help computation?

Expanding (X − E[X])² gives X² − 2E[X]·X + (E[X])². Taking expectation and using linearity turns this into E[X²] − 2E[X]·E[X] + (E[X])². Since E[X] is a constant, this simplifies to E[X²] − (E[X])². So variance can be computed by finding E[X²] and subtracting the square of the mean.

What does variance look like for a discrete uniform distribution over n outcomes?

If X takes values X1, …, Xn each with probability 1/n, then E[X] is the arithmetic mean X̄ = (1/n)∑_{j=1}^n Xj. Using Var(X) = E[(X − E[X])²], the variance becomes (1/n)∑_{j=1}^n (Xj − X̄)². It’s the average of squared deviations from the mean, scaled by 1/n.

How is variance computed for an exponential distribution with rate λ?

For an exponential distribution, the density is f(x) = λe^{-λx} for x ≥ 0. The mean is E[X] = 1/λ. Variance requires E[X²], computed as ∫_0^∞ x²·λe^{-λx} dx. Integration by parts twice gives E[X²] = 2/λ². Then Var(X) = E[X²] − (E[X])² = 2/λ² − (1/λ)² = 1/λ².

What condition must hold for variance to be well-defined?

Variance relies on expectations, so the expectation used in its definition must exist. In the derived form Var(X) = E[X²] − (E[X])², this means the integral/sum defining E[X²] must converge (the “abstract integral of x squared exists”). Without that, variance isn’t a meaningful finite number.

Review Questions

What is the difference between E[X] and Var(X], and how does squaring change what gets averaged?
Use Var(X) = E[X²] − (E[X])² to outline the steps needed to compute variance for any random variable.
For a discrete uniform distribution on n values, how do you express Var(X) in terms of the mean X̄ and the outcomes Xj?

Key Points

1
Variance measures spread by averaging squared deviations from the expectation: Var(X) = E[(X − E[X])²].
2
A key computational shortcut is Var(X) = E[X²] − (E[X])², derived by expanding the square and applying linearity of expectation.
3
The expectation of a constant equals the constant itself, which simplifies terms involving E[X] during the derivation.
4
For discrete random variables, expectations become sums weighted by the probability mass function; for continuous variables, they become integrals weighted by the probability density function.
5
For a discrete uniform distribution over n outcomes, Var(X) equals (1/n) times the sum of squared deviations from the arithmetic mean.
6
For an exponential distribution with rate λ, the variance is 1/λ², obtained from E[X] = 1/λ and E[X²] = 2/λ².
7
Variance is only defined when the relevant expectation (notably E[X²]) exists and is finite.

Highlights

Variance is the mean squared distance from the mean: Var(X) = E[(X − E[X])²].

The practical formula Var(X) = E[X²] − (E[X])² turns variance into “mean of the square minus square of the mean.”

For an exponential distribution with rate λ, the variance comes out exactly as 1/λ² after computing E[X²] via integration by parts.

Topics

Variance
Expectation
Discrete Uniform
Exponential Distribution
Mean Squared Deviation