Probability Theory 22 | Conditional Expectation (given random variables)
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Conditional expectation given a random variable, E[X | Y], is a new random variable obtained by conditioning on the event {Y=y} for each possible y and then substituting the actual random value Y(ω).
Briefing
Conditional expectation given a random variable turns “conditioning on an event” into a new random variable that updates uncertainty based on what another variable equals. For discrete random variables, the conditional expectation of X given Y=y is computed by weighting each possible value of X with the conditional probability P(X=x | Y=y), producing a function of y. That function then becomes a random variable by plugging in the actual random value Y(ω), written as E[X | Y]. This matters because it’s the bridge from basic conditional probability to the machinery used in stochastic processes, where conditioning often depends on evolving random states rather than a fixed event.
In the discrete setting, the course first recalls the earlier definition of E[X | B] for an event B with positive probability, where conditioning reduces to a sum over X’s possible outcomes. It then specializes B to the event “Y equals y,” yielding E[X | Y=y]. Using the standard conditional probability formula, the conditional probability P(X=x | Y=y) becomes a ratio involving the joint probability mass function of (X,Y). As y changes, the conditional expectation changes too, so E[X | Y] is naturally viewed as a real-valued function of y, which is then composed with the random variable Y to form a new random variable.
A die example makes the construction concrete. Let X indicate whether the die outcome is even (X=1 for {2,4,6}, otherwise 0). Let Y indicate whether the outcome is 6 (Y=1 only when the die shows 6, otherwise 0). For Y=0, the conditional expectation E[X | Y=0] averages X over the remaining outcomes {1,2,3,4,5}, where X=1 occurs in {2,4}, giving 2/5. For Y=1, conditioning forces the outcome to be 6, so X must be 1 and E[X | Y=1]=1. Together, these values define the random variable E[X | Y], which encodes how knowledge about Y shifts the expected value of X.
The discussion then extends the same idea to absolutely continuous random variables. When X and Y have a joint density f(x,y), the conditional expectation E[X | Y] is computed using an integral: the numerator integrates x times the joint density over x, while the denominator divides by the marginal density of Y. The resulting expression is a function g(y) that again becomes a random variable via composition g(Y).
Finally, several core properties are highlighted. If X and Y are independent, conditioning on Y removes any dependence, so E[X | Y] collapses to the ordinary expectation E[X]. A related simplification appears for products: conditioning on Y of X·Y reduces to E[X]·(a constant in the appropriate sense), because the Y-related terms cancel under the conditioning structure. Also, E[X | X] returns X itself. The course closes with the “law of total expectation,” stating that averaging the conditional expectations reproduces the unconditional expectation: E(E[X | Y]) = E[X]. This ties conditional probability weighting back to the familiar overall mean and sets up the later use of conditional expectations in stochastic processes.
Cornell Notes
Conditional expectation given a random variable, E[X | Y], produces a new random variable that reflects how the expected value of X changes once Y is known. In the discrete case, E[X | Y=y] is computed by summing x·P(X=x | Y=y) over all possible x; the conditional probabilities come from the joint pmf of (X,Y). In the die example, conditioning on whether the outcome is 6 changes the expected “evenness” from 2/5 (when Y=0) to 1 (when Y=1). For absolutely continuous variables, the discrete sum becomes an integral: E[X | Y=y] = ∫ x f(x,y) dx / f_Y(y). Key properties include independence giving E[X | Y]=E[X], E[X | X]=X, and the law of total expectation E(E[X | Y])=E[X].
How does E[X | Y] differ from E[X | B] and how is it constructed for discrete random variables?
In the die example, why is E[X | Y=0] equal to 2/5?
Why does E[X | Y=1] equal 1 in the die example?
What changes when moving from discrete to absolutely continuous random variables?
What does independence imply for conditional expectation?
How does the law of total expectation connect conditional and unconditional means?
Review Questions
- For discrete X and Y, write the formula for E[X | Y=y] in terms of P(X=x | Y=y) and explain how it becomes a random variable E[X | Y].
- In the die example, list the sample outcomes contributing to X=1 under the conditions Y=0 and Y=1, and compute the conditional expectations.
- For absolutely continuous X and Y with joint density f(x,y), what is the role of the marginal density f_Y(y) in the formula for E[X | Y=y]?
Key Points
- 1
Conditional expectation given a random variable, E[X | Y], is a new random variable obtained by conditioning on the event {Y=y} for each possible y and then substituting the actual random value Y(ω).
- 2
In the discrete case, E[X | Y=y] is computed as a weighted sum Σ_x x·P(X=x | Y=y), where the conditional probabilities come from the joint pmf of (X,Y).
- 3
The die example shows how conditioning changes expectations: E[X | Y=0]=2/5 when the outcome is not 6, while E[X | Y=1]=1 when the outcome is 6.
- 4
For absolutely continuous variables, the discrete sum becomes an integral: E[X | Y=y] = ∫ x f(x,y) dx / f_Y(y).
- 5
Independence simplifies conditioning: if X and Y are independent, then E[X | Y] equals the ordinary expectation E[X].
- 6
The identity E[X | X]=X holds because knowing X determines the value of X exactly.
- 7
The law of total expectation links conditional and unconditional means: E(E[X | Y]) = E[X].