Probability Theory 19 | Covariance and Correlation
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Covariance is defined as Cov(X,Y)=E[(X−E[X])(Y−E[Y])], and it can be negative or positive.
Briefing
Covariance and correlation provide a way to quantify how two random variables move together—whether they tend to increase and decrease in tandem, or behave independently. Covariance is built from the joint “deviations from the mean”: take (X − E[X]) and (Y − E[Y]), multiply them, and then average the product. The result is a real number that can be positive, negative, or zero—unlike variance, which is always nonnegative. A key identity simplifies computation: Cov(X, Y) = E[XY] − E[X]E[Y]. This makes covariance a direct test of whether the average of the product matches what independence would predict.
When X and Y are independent, E[XY] equals E[X]E[Y], so covariance becomes zero. That gives a reliable one-way implication: zero covariance is consistent with independence, but nonzero covariance guarantees the variables are not independent. The reverse direction is trickier. Covariance being zero does not, in general, force independence; it only means the variables are uncorrelated, a weaker condition than independence. Independence requires a stronger relationship: for every choice of thresholds, the events {X ≤ x} and {Y ≤ y} must have probabilities that factor as P(X ≤ x, Y ≤ y) = P(X ≤ x)P(Y ≤ y). Covariance alone cannot capture that full event-level structure.
Because covariance can be misleading when X and Y have large individual variances, the course introduces normalization via the correlation coefficient. Using the Cauchy–Schwarz inequality, it is shown that Cov(X, Y)^2 ≤ Var(X)Var(Y). Dividing by the product of standard deviations yields a dimensionless quantity:
ρ(X, Y) = Cov(X, Y) / (σ_X σ_Y).
This correlation coefficient always lies between −1 and +1. Values near 0 indicate weak linear association and are consistent with near-independence, while values near −1 or +1 indicate strong negative or positive linear dependence.
An explicit example demonstrates why uncorrelated does not imply independent. Consider a discrete probability space with three equally likely outcomes {a, b, c}. Define X by X(a)=1, X(b)=0, X(c)=−1, and define Y so that Y is nonzero only when X is nonzero—specifically, Y(a)=1 and Y(c)=−1 while Y(b)=0. Under this construction, X·Y is always 0, so E[XY]=0. Since E[X]=0 as well, the covariance E[XY] − E[X]E[Y] becomes 0, meaning X and Y are uncorrelated.
Yet independence fails. Checking the factorization condition for events shows a mismatch: for instance, with thresholds x=−1 and y=0, the event {X ≤ −1} occurs only at outcome c, but {Y ≤ 0} includes both b and c. The joint probability does not equal the product of the marginals, so the required probability factorization cannot hold. The example makes the central takeaway concrete: covariance measures linear co-movement, not full independence.
Cornell Notes
Covariance quantifies how two random variables co-vary by averaging the product of their deviations from their means: Cov(X,Y)=E[(X−E[X])(Y−E[Y])]. A useful simplification gives Cov(X,Y)=E[XY]−E[X]E[Y]. Independence implies zero covariance because independence forces E[XY]=E[X]E[Y], but zero covariance does not generally imply independence—only uncorrelatedness. To measure dependence on a comparable scale, covariance is normalized using standard deviations, producing the correlation coefficient ρ(X,Y)=Cov(X,Y)/(σ_X σ_Y), which always lies between −1 and +1 thanks to Cauchy–Schwarz. A discrete example shows X and Y can have covariance 0 while still failing the event-level factorization required for independence.
How is covariance defined, and what does its sign mean?
Why does independence guarantee zero covariance?
Why doesn’t zero covariance imply independence?
How does correlation coefficient fix the “scale” problem with covariance?
What does the correlation coefficient tell you about dependence?
In the discrete example, how can covariance be zero while independence fails?
Review Questions
- State the formula for covariance in terms of expectations, and explain what independence implies about it.
- What inequality bounds Cov(X,Y)^2, and how does that lead to the range of the correlation coefficient?
- Give an example (or describe one) where Cov(X,Y)=0 but X and Y are not independent, and explain which independence condition fails.
Key Points
- 1
Covariance is defined as Cov(X,Y)=E[(X−E[X])(Y−E[Y])], and it can be negative or positive.
- 2
Covariance simplifies to Cov(X,Y)=E[XY]−E[X]E[Y], making it easier to compute.
- 3
Independence implies zero covariance because independence forces E[XY]=E[X]E[Y].
- 4
Zero covariance means uncorrelatedness, which does not generally guarantee independence.
- 5
Correlation coefficient normalizes covariance: ρ(X,Y)=Cov(X,Y)/(σ_X σ_Y).
- 6
Cauchy–Schwarz yields Cov(X,Y)^2 ≤ Var(X)Var(Y), ensuring −1 ≤ ρ(X,Y) ≤ 1.
- 7
Independence requires probability factorization for all threshold events {X≤x} and {Y≤y}, not just equality of average products.