Probability Theory 22 | Conditional Expectation (given random variables) [dark version]
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Conditional expectation given a random variable, E[X | Y], is a new random variable obtained by conditioning on the event {Y = y} for each possible y and then substituting y = Y(ω).
Briefing
Conditional expectation given a random variable extends the familiar idea of “averaging with information” from conditioning on an event to conditioning on another random variable. For discrete random variables, it becomes a weighted average where the weights are conditional probabilities of X’s values given that Y takes a specific value; the result is not just a number but a new random variable, written as E[X | Y]. This matters because it’s the bridge from basic conditional probability to the machinery used in stochastic processes, where the “known information” often arrives through random variables rather than fixed events.
The setup starts with the earlier definition of E[X | B] for a discrete random variable X and an event B with positive probability: it’s computed by summing over all possible values x of X, multiplying x by P(X = x | B). The new step replaces the event B with the event “Y equals y.” That yields E[X | Y = y], and by varying y, the conditional expectation becomes a function of y. Since Y itself is random (it maps outcomes ω to real numbers), composing that function with Y produces a new random variable E[X | Y] defined on the same probability space.
A concrete die example makes the construction tangible. Let X indicate whether the die result is even (X = 1 for {2,4,6}, otherwise 0). Let Y indicate whether the die result is 6 (Y = 1 only when the outcome is 6, otherwise 0). When Y = 0, the conditional expectation E[X | Y = 0] averages X over the remaining outcomes {1,2,3,4,5}. Only {2,4} make X = 1, so the conditional expectation is 2/5. When Y = 1, the only possible outcome is 6, which forces X = 1, so E[X | Y = 1] = 1. Putting both cases together shows how E[X | Y] “tracks” the uncertainty about X while accounting for what Y reveals.
For continuous random variables, the same idea generalizes by replacing sums with integrals and probability mass functions with joint densities. When X and Y are absolutely continuous and have a joint density f(x,y), the conditional expectation E[X | Y] is computed using an integral over x: essentially x multiplied by the conditional density (joint density divided by the marginal density of Y). The result again is a function of Y, hence a new random variable obtained by composing that function with Y.
Several key properties are highlighted. If X and Y are independent, conditioning on Y removes any dependence: conditional probabilities or densities factorize, and E[X | Y] reduces to the ordinary expectation E[X] (a constant random variable). A related simplification holds for products like E[X·Y | Y], where the Y term cancels, leaving E[X]. Conditioning on X itself gives E[X | X] = X. Finally, averaging the conditional expectations recovers the unconditional expectation: E(E[X | Y]) = E[X], tied to the law of total probability. These properties set up the later use of conditional expectation in stochastic processes, where conditioning on random information is routine.
Cornell Notes
Conditional expectation given a random variable, E[X | Y], generalizes conditioning on events by letting the “known information” come from another random variable. In the discrete case, E[X | Y] is built by first computing E[X | Y = y] as a weighted average of X’s values using conditional probabilities, then turning it into a random variable by plugging in the random value Y(ω). For a die example, E[X | Y] takes the value 2/5 when Y = 0 (evenness among outcomes excluding 6) and 1 when Y = 1 (6 forces evenness). In the continuous case, the same structure uses joint densities: E[X | Y] is an integral of x times the conditional density f(x,y)/f_Y(y). Key properties include independence (E[X | Y] = E[X]), E[X | X] = X, and the law of total expectation E(E[X | Y]) = E[X].
Why does E[X | Y] become a random variable rather than a single number?
How is E[X | Y = y] computed in the discrete case?
In the die example, why is E[X | Y = 0] equal to 2/5?
How does the continuous-case formula mirror the discrete one?
What does independence imply for conditional expectation?
Review Questions
- For discrete X and Y, write the expression for E[X | Y = y] in terms of P(X = x, Y = y) and P(Y = y).
- In the die example, what value does E[X | Y = 1] take and why must it be that value?
- State and interpret the identity E(E[X | Y]) = E[X]. What does it mean in terms of averaging conditional expectations?
Key Points
- 1
Conditional expectation given a random variable, E[X | Y], is a new random variable obtained by conditioning on the event {Y = y} for each possible y and then substituting y = Y(ω).
- 2
For discrete random variables, E[X | Y = y] is computed as a weighted sum: Σ_x x · P(X = x | Y = y).
- 3
For continuous random variables with joint density f(x,y), E[X | Y] uses an integral with conditional density f(x,y)/f_Y(y).
- 4
Independence implies E[X | Y] = E[X], because conditioning on Y does not change the distributional average of X.
- 5
Conditioning on X itself gives E[X | X] = X.
- 6
The law of total expectation holds: E(E[X | Y]) = E[X], linking conditional and unconditional averages.