Probability Theory 19 | Covariance and Correlation [OLD dark version]
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Covariance is defined as E[(X − E[X])(Y − E[Y])] and simplifies to E[XY] − E[X]E[Y].
Briefing
Covariance and correlation provide a quantitative way to measure whether two random variables move together—and how strongly that co-movement departs from independence. Covariance is defined using deviations from each variable’s mean: for random variables X and Y, it is the expected value of (X − E[X]) times (Y − E[Y]). Under standard integrability assumptions (finite expectations of X, Y, and also of X² and Y²), this becomes a well-defined real number that can be positive, negative, or zero. Expanding the product yields a compact identity: Cov(X, Y) = E[XY] − E[X]E[Y]. That form makes the relationship to independence immediate: if X and Y are independent, then E[XY] = E[X]E[Y], so covariance equals 0.
The key limitation is that covariance being zero does not guarantee independence. The transcript emphasizes that “uncorrelated” (Cov(X, Y) = 0) is weaker than “independent,” and the converse implication fails in general. Independence is a stronger condition about the joint behavior of events, not just the first-moment interaction captured by E[XY]. The course also notes a special case where the converse can hold: when both variables are normally distributed, covariance can be enough to infer independence.
Because covariance alone can be misleading when variables have large variances, the discussion introduces normalization via the correlation coefficient. Using the Cauchy–Schwarz inequality, it is shown that the normalized quantity
ρ(X, Y) = Cov(X, Y) / (StdDev(X)·StdDev(Y))
always lies between −1 and +1. Values near 0 indicate behavior close to independence, while values near −1 or +1 indicate strong negative or positive linear association, respectively. This normalization is what turns covariance into a scale-free measure of co-movement.
An explicit example demonstrates the gap between uncorrelated and independent. On a three-element sample space Ω = {A, B, C} with a uniform probability measure (each element has probability 1/3), the random variable X is defined by X(A)=1, X(B)=0, X(C)=−1. The random variable Y is chosen so that Y is nonzero only when X is zero: specifically, Y(B)=1 and Y(A)=Y(C)=0. With this construction, the product X·Y is identically zero, so E[XY]=0; also E[X]=0, which forces Cov(X, Y)=0. Yet independence fails: checking the independence condition for events like {X ≤ −1} and {Y ≤ 0} shows the required factorization of probabilities does not hold (the event intersection corresponds to only one outcome on the left, while the right-hand side splits across additional outcomes). The example confirms that zero covariance can coexist with dependence, reinforcing why correlation and independence are related but not identical concepts.
Cornell Notes
Covariance measures how two random variables co-vary around their means. For integrable random variables X and Y, Cov(X, Y) is defined as E[(X − E[X])(Y − E[Y])] and simplifies to E[XY] − E[X]E[Y]. Independence implies zero covariance because independence gives E[XY] = E[X]E[Y], but zero covariance does not imply independence in general (it only does under special conditions like joint normality). To compare co-movement across different scales, covariance is normalized into the correlation coefficient ρ(X, Y) = Cov(X, Y)/(StdDev(X)StdDev(Y), which is guaranteed by Cauchy–Schwarz to lie in [−1, 1].
Why does Cov(X, Y) equal E[XY] − E[X]E[Y], and what does that reveal about independence?
What’s the difference between “uncorrelated” and “independent”?
How does Cauchy–Schwarz lead to the bounds on correlation?
Why can covariance be misleading without normalization?
In the discrete example with Ω={A,B,C}, how are X and Y constructed to make Cov(X,Y)=0?
How does the same example show X and Y are not independent?
Review Questions
- What is the algebraic relationship between Cov(X,Y), E[XY], and E[X]E[Y]?
- Under what condition does independence guarantee zero covariance, and why does the converse fail in general?
- How is the correlation coefficient defined from covariance, and why must it lie between −1 and +1?
Key Points
- 1
Covariance is defined as E[(X − E[X])(Y − E[Y])] and simplifies to E[XY] − E[X]E[Y].
- 2
Independence implies zero covariance because independence forces E[XY] = E[X]E[Y].
- 3
Zero covariance (uncorrelated variables) does not generally imply independence.
- 4
The correlation coefficient normalizes covariance by StdDev(X)StdDev(Y) to produce a scale-free measure.
- 5
Cauchy–Schwarz guarantees the correlation coefficient lies in the interval [−1, +1].
- 6
A discrete counterexample can have Cov(X,Y)=0 while still violating the probability factorization required for independence.