Get AI summaries of any video or article — Sign up free
Probability Theory 19 | Covariance and Correlation thumbnail

Probability Theory 19 | Covariance and Correlation

4 min read

Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Covariance is defined as Cov(X,Y)=E[(X−E[X])(Y−E[Y])], and it can be negative or positive.

Briefing

Covariance and correlation provide a way to quantify how two random variables move together—whether they tend to increase and decrease in tandem, or behave independently. Covariance is built from the joint “deviations from the mean”: take (X − E[X]) and (Y − E[Y]), multiply them, and then average the product. The result is a real number that can be positive, negative, or zero—unlike variance, which is always nonnegative. A key identity simplifies computation: Cov(X, Y) = E[XY] − E[X]E[Y]. This makes covariance a direct test of whether the average of the product matches what independence would predict.

When X and Y are independent, E[XY] equals E[X]E[Y], so covariance becomes zero. That gives a reliable one-way implication: zero covariance is consistent with independence, but nonzero covariance guarantees the variables are not independent. The reverse direction is trickier. Covariance being zero does not, in general, force independence; it only means the variables are uncorrelated, a weaker condition than independence. Independence requires a stronger relationship: for every choice of thresholds, the events {X ≤ x} and {Y ≤ y} must have probabilities that factor as P(X ≤ x, Y ≤ y) = P(X ≤ x)P(Y ≤ y). Covariance alone cannot capture that full event-level structure.

Because covariance can be misleading when X and Y have large individual variances, the course introduces normalization via the correlation coefficient. Using the Cauchy–Schwarz inequality, it is shown that Cov(X, Y)^2 ≤ Var(X)Var(Y). Dividing by the product of standard deviations yields a dimensionless quantity:

ρ(X, Y) = Cov(X, Y) / (σ_X σ_Y).

This correlation coefficient always lies between −1 and +1. Values near 0 indicate weak linear association and are consistent with near-independence, while values near −1 or +1 indicate strong negative or positive linear dependence.

An explicit example demonstrates why uncorrelated does not imply independent. Consider a discrete probability space with three equally likely outcomes {a, b, c}. Define X by X(a)=1, X(b)=0, X(c)=−1, and define Y so that Y is nonzero only when X is nonzero—specifically, Y(a)=1 and Y(c)=−1 while Y(b)=0. Under this construction, X·Y is always 0, so E[XY]=0. Since E[X]=0 as well, the covariance E[XY] − E[X]E[Y] becomes 0, meaning X and Y are uncorrelated.

Yet independence fails. Checking the factorization condition for events shows a mismatch: for instance, with thresholds x=−1 and y=0, the event {X ≤ −1} occurs only at outcome c, but {Y ≤ 0} includes both b and c. The joint probability does not equal the product of the marginals, so the required probability factorization cannot hold. The example makes the central takeaway concrete: covariance measures linear co-movement, not full independence.

Cornell Notes

Covariance quantifies how two random variables co-vary by averaging the product of their deviations from their means: Cov(X,Y)=E[(X−E[X])(Y−E[Y])]. A useful simplification gives Cov(X,Y)=E[XY]−E[X]E[Y]. Independence implies zero covariance because independence forces E[XY]=E[X]E[Y], but zero covariance does not generally imply independence—only uncorrelatedness. To measure dependence on a comparable scale, covariance is normalized using standard deviations, producing the correlation coefficient ρ(X,Y)=Cov(X,Y)/(σ_X σ_Y), which always lies between −1 and +1 thanks to Cauchy–Schwarz. A discrete example shows X and Y can have covariance 0 while still failing the event-level factorization required for independence.

How is covariance defined, and what does its sign mean?

Covariance is defined as Cov(X,Y)=E[(X−E[X])(Y−E[Y])]. Expanding the product yields the computational form Cov(X,Y)=E[XY]−E[X]E[Y]. The value is a real number: positive covariance indicates that when X is above its mean, Y tends to be above its mean too (and similarly below), while negative covariance indicates opposite movement. Zero covariance indicates no linear co-movement, not necessarily independence.

Why does independence guarantee zero covariance?

If X and Y are independent, then the expectation of the product factors: E[XY]=E[X]E[Y]. Plugging into Cov(X,Y)=E[XY]−E[X]E[Y] gives Cov(X,Y)=0. This is a one-way implication: independence ⇒ zero covariance.

Why doesn’t zero covariance imply independence?

Zero covariance only enforces E[XY]=E[X]E[Y], which is about averages of products. Independence requires a stronger condition for all thresholds: P(X≤x, Y≤y)=P(X≤x)P(Y≤y). The transcript’s discrete example constructs variables with Cov(X,Y)=0 but where this probability factorization fails for specific choices of x and y.

How does correlation coefficient fix the “scale” problem with covariance?

Covariance depends on the magnitudes of Var(X) and Var(Y). Two variables can have large variances yet small covariance, making raw covariance hard to interpret. Using Cauchy–Schwarz, Cov(X,Y)^2 ≤ Var(X)Var(Y). Dividing by σ_X σ_Y produces ρ(X,Y)=Cov(X,Y)/(σ_X σ_Y), a dimensionless number constrained to −1 ≤ ρ ≤ 1.

What does the correlation coefficient tell you about dependence?

Because ρ is normalized, values near 0 indicate weak linear association and are consistent with near-independence. Values near +1 indicate strong positive linear dependence, and values near −1 indicate strong negative linear dependence. The transcript emphasizes that correlation measures linear co-movement, not full independence.

In the discrete example, how can covariance be zero while independence fails?

With equally likely outcomes {a,b,c}, define X(a)=1, X(b)=0, X(c)=−1. Define Y so that Y is nonzero only when X is nonzero, arranged so that X·Y is identically 0 on every outcome. Then E[XY]=0, and since E[X]=0, Cov(X,Y)=E[XY]−E[X]E[Y]=0. However, independence fails the event factorization test: for thresholds x=−1 and y=0, {X≤−1} happens only at c, while {Y≤0} includes both b and c, so P(X≤−1, Y≤0) ≠ P(X≤−1)P(Y≤0).

Review Questions

  1. State the formula for covariance in terms of expectations, and explain what independence implies about it.
  2. What inequality bounds Cov(X,Y)^2, and how does that lead to the range of the correlation coefficient?
  3. Give an example (or describe one) where Cov(X,Y)=0 but X and Y are not independent, and explain which independence condition fails.

Key Points

  1. 1

    Covariance is defined as Cov(X,Y)=E[(X−E[X])(Y−E[Y])], and it can be negative or positive.

  2. 2

    Covariance simplifies to Cov(X,Y)=E[XY]−E[X]E[Y], making it easier to compute.

  3. 3

    Independence implies zero covariance because independence forces E[XY]=E[X]E[Y].

  4. 4

    Zero covariance means uncorrelatedness, which does not generally guarantee independence.

  5. 5

    Correlation coefficient normalizes covariance: ρ(X,Y)=Cov(X,Y)/(σ_X σ_Y).

  6. 6

    Cauchy–Schwarz yields Cov(X,Y)^2 ≤ Var(X)Var(Y), ensuring −1 ≤ ρ(X,Y) ≤ 1.

  7. 7

    Independence requires probability factorization for all threshold events {X≤x} and {Y≤y}, not just equality of average products.

Highlights

Cov(X,Y)=E[XY]−E[X]E[Y] turns covariance into a clean “product of expectations vs. expectation of product” test.
Independence guarantees covariance zero, but the reverse implication fails in general.
Normalization via Cauchy–Schwarz produces a correlation coefficient confined to the interval [−1,1].
A discrete three-outcome construction shows uncorrelated variables can still violate the event-level definition of independence.