Get AI summaries of any video or article — Sign up free
Probability Theory 22 | Conditional Expectation (given random variables) thumbnail

Probability Theory 22 | Conditional Expectation (given random variables)

5 min read

Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Conditional expectation given a random variable, E[X | Y], is a new random variable obtained by conditioning on the event {Y=y} for each possible y and then substituting the actual random value Y(ω).

Briefing

Conditional expectation given a random variable turns “conditioning on an event” into a new random variable that updates uncertainty based on what another variable equals. For discrete random variables, the conditional expectation of X given Y=y is computed by weighting each possible value of X with the conditional probability P(X=x | Y=y), producing a function of y. That function then becomes a random variable by plugging in the actual random value Y(ω), written as E[X | Y]. This matters because it’s the bridge from basic conditional probability to the machinery used in stochastic processes, where conditioning often depends on evolving random states rather than a fixed event.

In the discrete setting, the course first recalls the earlier definition of E[X | B] for an event B with positive probability, where conditioning reduces to a sum over X’s possible outcomes. It then specializes B to the event “Y equals y,” yielding E[X | Y=y]. Using the standard conditional probability formula, the conditional probability P(X=x | Y=y) becomes a ratio involving the joint probability mass function of (X,Y). As y changes, the conditional expectation changes too, so E[X | Y] is naturally viewed as a real-valued function of y, which is then composed with the random variable Y to form a new random variable.

A die example makes the construction concrete. Let X indicate whether the die outcome is even (X=1 for {2,4,6}, otherwise 0). Let Y indicate whether the outcome is 6 (Y=1 only when the die shows 6, otherwise 0). For Y=0, the conditional expectation E[X | Y=0] averages X over the remaining outcomes {1,2,3,4,5}, where X=1 occurs in {2,4}, giving 2/5. For Y=1, conditioning forces the outcome to be 6, so X must be 1 and E[X | Y=1]=1. Together, these values define the random variable E[X | Y], which encodes how knowledge about Y shifts the expected value of X.

The discussion then extends the same idea to absolutely continuous random variables. When X and Y have a joint density f(x,y), the conditional expectation E[X | Y] is computed using an integral: the numerator integrates x times the joint density over x, while the denominator divides by the marginal density of Y. The resulting expression is a function g(y) that again becomes a random variable via composition g(Y).

Finally, several core properties are highlighted. If X and Y are independent, conditioning on Y removes any dependence, so E[X | Y] collapses to the ordinary expectation E[X]. A related simplification appears for products: conditioning on Y of X·Y reduces to E[X]·(a constant in the appropriate sense), because the Y-related terms cancel under the conditioning structure. Also, E[X | X] returns X itself. The course closes with the “law of total expectation,” stating that averaging the conditional expectations reproduces the unconditional expectation: E(E[X | Y]) = E[X]. This ties conditional probability weighting back to the familiar overall mean and sets up the later use of conditional expectations in stochastic processes.

Cornell Notes

Conditional expectation given a random variable, E[X | Y], produces a new random variable that reflects how the expected value of X changes once Y is known. In the discrete case, E[X | Y=y] is computed by summing x·P(X=x | Y=y) over all possible x; the conditional probabilities come from the joint pmf of (X,Y). In the die example, conditioning on whether the outcome is 6 changes the expected “evenness” from 2/5 (when Y=0) to 1 (when Y=1). For absolutely continuous variables, the discrete sum becomes an integral: E[X | Y=y] = ∫ x f(x,y) dx / f_Y(y). Key properties include independence giving E[X | Y]=E[X], E[X | X]=X, and the law of total expectation E(E[X | Y])=E[X].

How does E[X | Y] differ from E[X | B] and how is it constructed for discrete random variables?

E[X | B] conditions on a fixed event B. When the condition is “Y equals y,” the event becomes {Y=y}, so E[X | Y=y] is computed like E[X | B] but with B replaced by {Y=y}. For discrete variables, E[X | Y=y] = Σ_x x·P(X=x | Y=y), where P(X=x | Y=y) is a ratio involving the joint pmf of (X,Y). Because the result depends on y, it defines a function of y; composing that function with the random variable Y gives the new random variable E[X | Y] = g(Y).

In the die example, why is E[X | Y=0] equal to 2/5?

Here X=1 when the die is even ({2,4,6}) and 0 otherwise, while Y=1 only when the die is 6. If Y=0, the outcome cannot be 6, so the remaining possibilities are {1,2,3,4,5} (5 equally likely outcomes). Among these, X=1 occurs for {2,4} (2 outcomes). Thus E[X | Y=0] = (1·2 + 0·3)/5 = 2/5.

Why does E[X | Y=1] equal 1 in the die example?

If Y=1, the die outcome is forced to be 6. Since 6 is even, X must equal 1 with probability 1 under this condition. Therefore the conditional expectation is E[X | Y=1] = 1.

What changes when moving from discrete to absolutely continuous random variables?

The structure stays the same—conditional expectation becomes a function of the conditioning value—but the computation changes from a sum to an integral. With a joint density f(x,y), E[X | Y=y] = ∫ x·f(x,y) dx / f_Y(y), where f_Y(y) is the marginal density of Y (the denominator). The resulting function g(y) is then turned into a random variable by composing with Y: E[X | Y] = g(Y).

What does independence imply for conditional expectation?

If X and Y are independent, the joint density/pmf factorizes, so the conditional distribution of X given Y=y matches the unconditional distribution of X. In that case, the conditioning terms cancel, leaving E[X | Y] equal to the ordinary expectation E[X].

How does the law of total expectation connect conditional and unconditional means?

Averaging the conditional expectation over the randomness in Y returns the unconditional expectation. Formally, E(E[X | Y]) = E[X]. Intuitively, conditional expectations weight X by conditional probabilities for each Y value, and then the outer expectation averages those conditional results according to how likely each Y value is.

Review Questions

  1. For discrete X and Y, write the formula for E[X | Y=y] in terms of P(X=x | Y=y) and explain how it becomes a random variable E[X | Y].
  2. In the die example, list the sample outcomes contributing to X=1 under the conditions Y=0 and Y=1, and compute the conditional expectations.
  3. For absolutely continuous X and Y with joint density f(x,y), what is the role of the marginal density f_Y(y) in the formula for E[X | Y=y]?

Key Points

  1. 1

    Conditional expectation given a random variable, E[X | Y], is a new random variable obtained by conditioning on the event {Y=y} for each possible y and then substituting the actual random value Y(ω).

  2. 2

    In the discrete case, E[X | Y=y] is computed as a weighted sum Σ_x x·P(X=x | Y=y), where the conditional probabilities come from the joint pmf of (X,Y).

  3. 3

    The die example shows how conditioning changes expectations: E[X | Y=0]=2/5 when the outcome is not 6, while E[X | Y=1]=1 when the outcome is 6.

  4. 4

    For absolutely continuous variables, the discrete sum becomes an integral: E[X | Y=y] = ∫ x f(x,y) dx / f_Y(y).

  5. 5

    Independence simplifies conditioning: if X and Y are independent, then E[X | Y] equals the ordinary expectation E[X].

  6. 6

    The identity E[X | X]=X holds because knowing X determines the value of X exactly.

  7. 7

    The law of total expectation links conditional and unconditional means: E(E[X | Y]) = E[X].

Highlights

E[X | Y] is built by turning “conditioning on {Y=y}” into a function of y and then composing it with the random variable Y.
In the die example, conditioning on Y=0 leaves five outcomes and yields E[X | Y=0]=2/5, while conditioning on Y=1 forces X=1 and yields E[X | Y=1]=1.
For continuous variables, conditional expectation uses the ratio of an integral involving the joint density to the marginal density of Y.
Independence makes conditional expectation collapse: E[X | Y]=E[X].
Averaging conditional expectations recovers the unconditional mean: E(E[X | Y])=E[X].

Topics