Probability Theory 16 | Variance [dark version]
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Variance measures spread by averaging squared deviations from the expectation: Var(X) = E[(X − E[X])²].
Briefing
Variance turns “how much a random variable fluctuates around its mean” into a precise number. After defining expectation as the average value a random variable centers around, variance measures the spread by looking at deviations from that expectation, squaring them to remove sign, and then averaging the squared deviations. The key requirement is that the expectation involved in the variance calculation must exist; otherwise the spread measure isn’t well-defined.
Formally, variance is built from the random variable’s deviation from its mean: start with X − E[X], note that this new quantity has expectation 0, then square it and take its expectation. That yields Var(X) = E[(X − E[X])²]. Expanding the square gives a more computationally convenient form: Var(X) = E[X²] − (E[X])². This identity follows from linearity of expectation and the fact that E[c] = c for constants (including E[1] = 1 under any probability measure). In practice, this means variance can be computed by finding the expected value of X² and subtracting the square of the expected value of X.
To compute variance without abstract measure-theory language, the transcript connects the definition to the two standard cases. For discrete random variables, expectations become sums over outcomes weighted by a probability mass function. For continuous random variables, expectations become integrals weighted by a probability density function. In both settings, the only change from the mean calculation is that X is replaced by X² inside the expectation.
A first worked example uses a discrete uniform distribution over n outcomes {X1, …, Xn}, where each outcome has probability 1/n. The expectation becomes the arithmetic mean X̄ = (1/n)∑_{j=1}^n Xj. Using the variance definition, the spread becomes Var(X) = (1/n)∑_{j=1}^n (Xj − X̄)²—an average of squared deviations from the mean, scaled by 1/n. This generalizes the familiar “fair die” intuition to any finite set of equally likely outcomes.
The second example switches to a continuous exponential distribution with rate parameter λ. From the earlier expectation result, E[X] = 1/λ. To get variance, the calculation needs E[X²], which becomes the integral ∫_0^∞ x² · (λ e^{-λx}) dx. Solving via integration by parts twice gives E[X²] = 2/λ². Plugging into Var(X) = E[X²] − (E[X])² yields Var(X) = 1/λ², the standard variance formula for the exponential distribution.
Overall, variance is presented as a mean-squared deviation: compute E[X²], subtract (E[X])², and interpret the result as the width of the distribution around its expectation. The next step promised is to study additional properties that make variance especially useful in later probability work.
Cornell Notes
Variance quantifies how widely a random variable spreads around its expectation. Starting from the deviation X − E[X], squaring removes negative deviations and then taking expectation averages the squared distance. The transcript derives the practical identity Var(X) = E[X²] − (E[X])² using linearity of expectation and the rule that the expectation of a constant equals the constant. Computing variance then reduces to calculating E[X²] in the appropriate setting: sums for discrete distributions and integrals for continuous ones. Two examples illustrate the method: a discrete uniform distribution gives Var(X) = (1/n)∑(Xj − X̄)², and an exponential distribution with rate λ yields Var(X) = 1/λ².
Why does variance start with X − E[X] and then square it?
How does the identity Var(X) = E[X²] − (E[X])² help computation?
What does variance look like for a discrete uniform distribution over n outcomes?
How is variance computed for an exponential distribution with rate λ?
What condition must hold for variance to be well-defined?
Review Questions
- What is the difference between E[X] and Var(X], and how does squaring change what gets averaged?
- Use Var(X) = E[X²] − (E[X])² to outline the steps needed to compute variance for any random variable.
- For a discrete uniform distribution on n values, how do you express Var(X) in terms of the mean X̄ and the outcomes Xj?
Key Points
- 1
Variance measures spread by averaging squared deviations from the expectation: Var(X) = E[(X − E[X])²].
- 2
A key computational shortcut is Var(X) = E[X²] − (E[X])², derived by expanding the square and applying linearity of expectation.
- 3
The expectation of a constant equals the constant itself, which simplifies terms involving E[X] during the derivation.
- 4
For discrete random variables, expectations become sums weighted by the probability mass function; for continuous variables, they become integrals weighted by the probability density function.
- 5
For a discrete uniform distribution over n outcomes, Var(X) equals (1/n) times the sum of squared deviations from the arithmetic mean.
- 6
For an exponential distribution with rate λ, the variance is 1/λ², obtained from E[X] = 1/λ and E[X²] = 2/λ².
- 7
Variance is only defined when the relevant expectation (notably E[X²]) exists and is finite.