Probability Theory 19 | Covariance and Correlation [OLD dark version]

TL;DR

Covariance is defined as E[(X − E[X])(Y − E[Y])] and simplifies to E[XY] − E[X]E[Y].

Briefing Cornell Notes

Briefing

Covariance and correlation provide a quantitative way to measure whether two random variables move together—and how strongly that co-movement departs from independence. Covariance is defined using deviations from each variable’s mean: for random variables X and Y, it is the expected value of (X − E[X]) times (Y − E[Y]). Under standard integrability assumptions (finite expectations of X, Y, and also of X² and Y²), this becomes a well-defined real number that can be positive, negative, or zero. Expanding the product yields a compact identity: Cov(X, Y) = E[XY] − E[X]E[Y]. That form makes the relationship to independence immediate: if X and Y are independent, then E[XY] = E[X]E[Y], so covariance equals 0.

The key limitation is that covariance being zero does not guarantee independence. The transcript emphasizes that “uncorrelated” (Cov(X, Y) = 0) is weaker than “independent,” and the converse implication fails in general. Independence is a stronger condition about the joint behavior of events, not just the first-moment interaction captured by E[XY]. The course also notes a special case where the converse can hold: when both variables are normally distributed, covariance can be enough to infer independence.

Because covariance alone can be misleading when variables have large variances, the discussion introduces normalization via the correlation coefficient. Using the Cauchy–Schwarz inequality, it is shown that the normalized quantity

ρ(X, Y) = Cov(X, Y) / (StdDev(X)·StdDev(Y))

always lies between −1 and +1. Values near 0 indicate behavior close to independence, while values near −1 or +1 indicate strong negative or positive linear association, respectively. This normalization is what turns covariance into a scale-free measure of co-movement.

An explicit example demonstrates the gap between uncorrelated and independent. On a three-element sample space Ω = {A, B, C} with a uniform probability measure (each element has probability 1/3), the random variable X is defined by X(A)=1, X(B)=0, X(C)=−1. The random variable Y is chosen so that Y is nonzero only when X is zero: specifically, Y(B)=1 and Y(A)=Y(C)=0. With this construction, the product X·Y is identically zero, so E[XY]=0; also E[X]=0, which forces Cov(X, Y)=0. Yet independence fails: checking the independence condition for events like {X ≤ −1} and {Y ≤ 0} shows the required factorization of probabilities does not hold (the event intersection corresponds to only one outcome on the left, while the right-hand side splits across additional outcomes). The example confirms that zero covariance can coexist with dependence, reinforcing why correlation and independence are related but not identical concepts.

Cornell Notes

Covariance measures how two random variables co-vary around their means. For integrable random variables X and Y, Cov(X, Y) is defined as E[(X − E[X])(Y − E[Y])] and simplifies to E[XY] − E[X]E[Y]. Independence implies zero covariance because independence gives E[XY] = E[X]E[Y], but zero covariance does not imply independence in general (it only does under special conditions like joint normality). To compare co-movement across different scales, covariance is normalized into the correlation coefficient ρ(X, Y) = Cov(X, Y)/(StdDev(X)StdDev(Y), which is guaranteed by Cauchy–Schwarz to lie in [−1, 1].

Why does Cov(X, Y) equal E[XY] − E[X]E[Y], and what does that reveal about independence?

Starting from Cov(X, Y) = E[(X − E[X])(Y − E[Y])], expand the product: (X − E[X])(Y − E[Y]) = XY − X E[Y] − Y E[X] + E[X]E[Y]. Taking expectation and using linearity gives E[XY] − E[X]E[Y] − E[Y]E[X] + E[X]E[Y], which simplifies to E[XY] − E[X]E[Y]. If X and Y are independent, then E[XY] = E[X]E[Y], so Cov(X, Y)=0.

What’s the difference between “uncorrelated” and “independent”?

Uncorrelated means Cov(X, Y)=0, which only captures the single scalar quantity E[XY] − E[X]E[Y]. Independence is stronger: it requires that for all thresholds x and y, the events {X ≤ x} and {Y ≤ y} satisfy P(X ≤ x, Y ≤ y) = P(X ≤ x)P(Y ≤ y). The transcript’s example shows uncorrelated variables can still violate this factorization, so dependence can exist even when covariance is zero.

How does Cauchy–Schwarz lead to the bounds on correlation?

The transcript uses the Cauchy–Schwarz inequality for integrals to compare the covariance magnitude to the product of standard deviations. After normalization, the correlation coefficient ρ(X, Y) = Cov(X, Y)/(StdDev(X)StdDev(Y)) must satisfy −1 ≤ ρ(X, Y) ≤ 1. This turns covariance into a scale-free measure that can be interpreted consistently across different variable spreads.

Why can covariance be misleading without normalization?

Covariance has the units and scale of the variables. Two variables can have large variances, and even a small relative co-movement can produce a sizable covariance. Normalizing by StdDev(X)StdDev(Y) produces correlation, which reflects co-movement strength relative to each variable’s own variability, enabling meaningful comparisons.

In the discrete example with Ω={A,B,C}, how are X and Y constructed to make Cov(X,Y)=0?

With uniform probabilities P(A)=P(B)=P(C)=1/3, define X(A)=1, X(B)=0, X(C)=−1. Define Y so that Y is nonzero only when X=0: set Y(B)=1 and Y(A)=Y(C)=0. Then X·Y is always 0 (because whenever Y is nonzero, X equals 0), so E[XY]=0. Also E[X]= (1 + 0 − 1)/3 = 0, so Cov(X,Y)=E[XY]−E[X]E[Y]=0, making X and Y uncorrelated.

How does the same example show X and Y are not independent?

Independence would require P(X ≤ −1, Y ≤ 0) = P(X ≤ −1)P(Y ≤ 0). For X ≤ −1, only outcome C qualifies (since X(C)=−1), so the left side corresponds to P(C)=1/3. But P(X ≤ −1)P(Y ≤ 0) multiplies P(C)=1/3 by P(Y ≤ 0), where Y ≤ 0 includes outcomes A and C (because Y(A)=0 and Y(C)=0), giving P(Y ≤ 0)=2/3. The product becomes (1/3)(2/3)=2/9, which differs from 1/3, so independence fails.

Review Questions

What is the algebraic relationship between Cov(X,Y), E[XY], and E[X]E[Y]?
Under what condition does independence guarantee zero covariance, and why does the converse fail in general?
How is the correlation coefficient defined from covariance, and why must it lie between −1 and +1?

Key Points

1
Covariance is defined as E[(X − E[X])(Y − E[Y])] and simplifies to E[XY] − E[X]E[Y].
2
Independence implies zero covariance because independence forces E[XY] = E[X]E[Y].
3
Zero covariance (uncorrelated variables) does not generally imply independence.
4
The correlation coefficient normalizes covariance by StdDev(X)StdDev(Y) to produce a scale-free measure.
5
Cauchy–Schwarz guarantees the correlation coefficient lies in the interval [−1, +1].
6
A discrete counterexample can have Cov(X,Y)=0 while still violating the probability factorization required for independence.

Highlights

Cov(X,Y) = E[XY] − E[X]E[Y] makes the independence link immediate: independence forces covariance to vanish.

Correlation rescales covariance by standard deviations, yielding a bounded measure between −1 and +1.

Uncorrelated does not mean independent; the transcript’s three-outcome example proves dependence can persist when covariance is zero.