Get AI summaries of any video or article — Sign up free
Probability Theory 19 | Covariance and Correlation [old version] thumbnail

Probability Theory 19 | Covariance and Correlation [old version]

4 min read

Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Covariance is defined as E[(X−E[X])(Y−E[Y])] and can be negative, positive, or zero.

Briefing

Covariance and correlation are introduced as the core tools for measuring how two random variables move together—especially when they are not independent. Covariance is defined for two random variables X and Y (on a shared probability space) once the expectations needed for the calculation exist—specifically, E[X], E[Y], and the second moments E[X^2] and E[Y^2] must be finite. The definition mirrors the variance formula: take the deviations from the means, multiply them as (X − E[X])(Y − E[Y]), and then take the expectation. The result is a real number that can be positive, negative, or zero, unlike variance which is always nonnegative.

A key simplification shows covariance can be written as Cov(X, Y) = E[XY] − E[X]E[Y]. This form makes the relationship to independence immediate: if X and Y are independent, then E[XY] factors into E[X]E[Y], forcing covariance to be zero. The reverse direction is more subtle. Zero covariance implies the variables are uncorrelated, a weaker condition than independence. Independence is not guaranteed by covariance alone; covariance can miss nonlinear or structured dependence.

To quantify “how much” dependence exists, covariance is normalized into the correlation coefficient. Using the Cauchy–Schwarz inequality, the normalized quantity is shown to lie between −1 and +1. The correlation coefficient is essentially covariance divided by the product of the standard deviations, so it becomes scale-free: it measures linear association rather than raw co-movement. Values near 0 indicate weak linear relationship and are consistent with near-independence in the linear sense, while values near ±1 indicate strong linear coupling.

An explicit example demonstrates why uncorrelated variables need not be independent. The sample space has three equally likely outcomes {A, B, C}. Define X by mapping A → 1, B → 0, C → −1, so E[X] = 0. Define Y so that Y = 0 whenever X ≠ 0; under this construction, Y can be nonzero only when X = 0 (which occurs at B). Because the product X·Y is then always 0, E[XY] = 0. With E[X] = 0, the covariance formula gives Cov(X, Y) = E[XY] − E[X]E[Y] = 0, so X and Y are uncorrelated.

Yet independence fails. Independence requires that for any thresholds (x, y), the probability that X ≤ x and Y ≤ y equals the product of the separate probabilities. Testing a specific choice—X ≤ −1 and Y ≤ 0—reveals the mismatch: the event on the left corresponds only to outcome C, while the product on the right counts additional probability mass (it includes outcomes where X ≤ −1 and where Y ≤ 0 separately), so the factorization property does not hold. The example underscores the central takeaway: covariance detects linear dependence, not all forms of dependence, so uncorrelated does not automatically mean independent.

Cornell Notes

Covariance measures how two random variables co-vary by averaging the product of their mean-centered deviations: Cov(X,Y)=E[(X−E[X])(Y−E[Y])]. It simplifies to Cov(X,Y)=E[XY]−E[X]E[Y], which guarantees that independent variables have zero covariance. Zero covariance, however, only implies uncorrelatedness, not independence, because dependence can exist without affecting the mean-centered product. To make the measure comparable across different scales, covariance is normalized into the correlation coefficient by dividing by the product of standard deviations. Cauchy–Schwarz ensures the correlation coefficient always lies between −1 and +1. A discrete three-outcome example shows uncorrelated variables can still be dependent.

Why does independence force covariance to be zero?

If X and Y are independent, then E[XY] factors into E[X]E[Y]. Plugging into Cov(X,Y)=E[XY]−E[X]E[Y] gives Cov(X,Y)=0. The key step is the factorization of the joint expectation under independence.

What does it mean when covariance equals zero, and why is it weaker than independence?

Cov(X,Y)=0 means X and Y are uncorrelated: E[XY]=E[X]E[Y]. Independence is stronger because it requires the probability of joint threshold events to factor for all choices of bounds. Zero covariance only matches one moment relationship, so structured or nonlinear dependence can still exist.

How is the correlation coefficient constructed from covariance, and what range does it take?

The correlation coefficient is the normalized covariance: it divides Cov(X,Y) by the product of standard deviations, i.e., by sqrt(Var(X))·sqrt(Var(Y)). Using the Cauchy–Schwarz inequality, this normalized value is guaranteed to lie between −1 and +1, making it a scale-free measure of linear association.

What conditions must hold for covariance to be well-defined in this setup?

The expectations used in the covariance formula must exist and be finite. In particular, E[X] and E[Y] must exist, and the second-moment assumptions are required so that E[X^2] and E[Y^2] are finite, ensuring the variance/standard deviation terms used in normalization are meaningful.

How does the three-outcome example show uncorrelated variables can be dependent?

With Ω={A,B,C} uniformly likely, X is defined by A→1, B→0, C→−1, so E[X]=0. Y is set to be 0 whenever X≠0, meaning X·Y is always 0, so E[XY]=0. Then Cov(X,Y)=E[XY]−E[X]E[Y]=0, so X and Y are uncorrelated. But independence fails when checking threshold events (e.g., X≤−1 and Y≤0): the left-side event corresponds only to C, while the right-side product counts different probability mass, so the factorization condition does not hold.

Review Questions

  1. What is the algebraic relationship between covariance and expectations (the form involving E[XY] and E[X]E[Y])?
  2. Why does zero covariance not guarantee independence, and what property would independence require instead?
  3. How does normalization using standard deviations change covariance into a correlation coefficient, and what does the −1 to +1 range imply?

Key Points

  1. 1

    Covariance is defined as E[(X−E[X])(Y−E[Y])] and can be negative, positive, or zero.

  2. 2

    Cov(X,Y) simplifies to E[XY]−E[X]E[Y], making the independence link immediate.

  3. 3

    Independence implies zero covariance, but zero covariance only implies uncorrelatedness, not independence.

  4. 4

    The correlation coefficient normalizes covariance by standard deviations and is bounded between −1 and +1 via Cauchy–Schwarz.

  5. 5

    Correlation measures linear association; dependence can exist even when covariance is zero.

  6. 6

    A discrete example with Ω={A,B,C} constructs X and Y so that X·Y is always 0, yielding zero covariance while independence still fails.

Highlights

Covariance can be rewritten as Cov(X,Y)=E[XY]−E[X]E[Y], so independence automatically forces covariance to vanish.
Zero covariance means uncorrelatedness, not independence—dependence can hide from the covariance calculation.
Normalizing covariance by standard deviations produces the correlation coefficient, which must fall in the interval [−1,+1].
A three-outcome construction shows uncorrelated variables can still violate the probability factorization required for independence.