Get AI summaries of any video or article — Sign up free
Probability Theory 22 | Conditional Expectation (given random variables) [dark version] thumbnail

Probability Theory 22 | Conditional Expectation (given random variables) [dark version]

5 min read

Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Conditional expectation given a random variable, E[X | Y], is a new random variable obtained by conditioning on the event {Y = y} for each possible y and then substituting y = Y(ω).

Briefing

Conditional expectation given a random variable extends the familiar idea of “averaging with information” from conditioning on an event to conditioning on another random variable. For discrete random variables, it becomes a weighted average where the weights are conditional probabilities of X’s values given that Y takes a specific value; the result is not just a number but a new random variable, written as E[X | Y]. This matters because it’s the bridge from basic conditional probability to the machinery used in stochastic processes, where the “known information” often arrives through random variables rather than fixed events.

The setup starts with the earlier definition of E[X | B] for a discrete random variable X and an event B with positive probability: it’s computed by summing over all possible values x of X, multiplying x by P(X = x | B). The new step replaces the event B with the event “Y equals y.” That yields E[X | Y = y], and by varying y, the conditional expectation becomes a function of y. Since Y itself is random (it maps outcomes ω to real numbers), composing that function with Y produces a new random variable E[X | Y] defined on the same probability space.

A concrete die example makes the construction tangible. Let X indicate whether the die result is even (X = 1 for {2,4,6}, otherwise 0). Let Y indicate whether the die result is 6 (Y = 1 only when the outcome is 6, otherwise 0). When Y = 0, the conditional expectation E[X | Y = 0] averages X over the remaining outcomes {1,2,3,4,5}. Only {2,4} make X = 1, so the conditional expectation is 2/5. When Y = 1, the only possible outcome is 6, which forces X = 1, so E[X | Y = 1] = 1. Putting both cases together shows how E[X | Y] “tracks” the uncertainty about X while accounting for what Y reveals.

For continuous random variables, the same idea generalizes by replacing sums with integrals and probability mass functions with joint densities. When X and Y are absolutely continuous and have a joint density f(x,y), the conditional expectation E[X | Y] is computed using an integral over x: essentially x multiplied by the conditional density (joint density divided by the marginal density of Y). The result again is a function of Y, hence a new random variable obtained by composing that function with Y.

Several key properties are highlighted. If X and Y are independent, conditioning on Y removes any dependence: conditional probabilities or densities factorize, and E[X | Y] reduces to the ordinary expectation E[X] (a constant random variable). A related simplification holds for products like E[X·Y | Y], where the Y term cancels, leaving E[X]. Conditioning on X itself gives E[X | X] = X. Finally, averaging the conditional expectations recovers the unconditional expectation: E(E[X | Y]) = E[X], tied to the law of total probability. These properties set up the later use of conditional expectation in stochastic processes, where conditioning on random information is routine.

Cornell Notes

Conditional expectation given a random variable, E[X | Y], generalizes conditioning on events by letting the “known information” come from another random variable. In the discrete case, E[X | Y] is built by first computing E[X | Y = y] as a weighted average of X’s values using conditional probabilities, then turning it into a random variable by plugging in the random value Y(ω). For a die example, E[X | Y] takes the value 2/5 when Y = 0 (evenness among outcomes excluding 6) and 1 when Y = 1 (6 forces evenness). In the continuous case, the same structure uses joint densities: E[X | Y] is an integral of x times the conditional density f(x,y)/f_Y(y). Key properties include independence (E[X | Y] = E[X]), E[X | X] = X, and the law of total expectation E(E[X | Y]) = E[X].

Why does E[X | Y] become a random variable rather than a single number?

Because conditioning on Y = y produces a function of y: E[X | Y = y]. Since Y itself is random (Y: Ω → ℝ), composing that function with Y yields E[X | Y](ω) = E[X | Y = Y(ω)]. Different outcomes ω can correspond to different y values, so the conditional expectation varies with ω.

How is E[X | Y = y] computed in the discrete case?

For discrete X and Y, E[X | Y = y] is computed like the earlier E[X | B] formula, but with B replaced by the event {Y = y}. Concretely, E[X | Y = y] = Σ_x x · P(X = x | Y = y), summing over all values x that X can take. The conditional probability P(X = x | Y = y) can be written using joint probabilities: P(X = x, Y = y) / P(Y = y).

In the die example, why is E[X | Y = 0] equal to 2/5?

Here X = 1 if the die is even and 0 otherwise; Y = 0 means the die is not 6. When Y = 0, the possible outcomes are {1,2,3,4,5} (5 outcomes). Among them, X = 1 occurs for {2,4} (2 outcomes). So E[X | Y = 0] = (1·2 + 0·3)/5 = 2/5.

How does the continuous-case formula mirror the discrete one?

The structure stays the same: replace sums with integrals and conditional probabilities with conditional densities. With joint density f(x,y), the conditional density of X given Y = y is f(x,y)/f_Y(y). Then E[X | Y] is computed as an integral over x: E[X | Y = y] = ∫ x · [f(x,y)/f_Y(y)] dx. Composing with Y gives the random variable E[X | Y].

What does independence imply for conditional expectation?

If X and Y are independent, the joint density/mass factorizes, so the Y-dependent terms cancel in the conditional density/probability. That leaves E[X | Y] = E[X], a constant random variable. In other words, knowing Y gives no information about X’s average.

Review Questions

  1. For discrete X and Y, write the expression for E[X | Y = y] in terms of P(X = x, Y = y) and P(Y = y).
  2. In the die example, what value does E[X | Y = 1] take and why must it be that value?
  3. State and interpret the identity E(E[X | Y]) = E[X]. What does it mean in terms of averaging conditional expectations?

Key Points

  1. 1

    Conditional expectation given a random variable, E[X | Y], is a new random variable obtained by conditioning on the event {Y = y} for each possible y and then substituting y = Y(ω).

  2. 2

    For discrete random variables, E[X | Y = y] is computed as a weighted sum: Σ_x x · P(X = x | Y = y).

  3. 3

    For continuous random variables with joint density f(x,y), E[X | Y] uses an integral with conditional density f(x,y)/f_Y(y).

  4. 4

    Independence implies E[X | Y] = E[X], because conditioning on Y does not change the distributional average of X.

  5. 5

    Conditioning on X itself gives E[X | X] = X.

  6. 6

    The law of total expectation holds: E(E[X | Y]) = E[X], linking conditional and unconditional averages.

Highlights

In the die example, E[X | Y] equals 2/5 when the die is not 6, and equals 1 when the die is 6—showing how conditioning on a random variable changes the average of another variable.
The discrete-to-continuous transition is systematic: sums become integrals and conditional probabilities become conditional densities (joint density divided by the marginal of Y).
Independence collapses conditional expectation to a constant: E[X | Y] becomes E[X].
Averaging conditional expectations returns the unconditional expectation: E(E[X | Y]) = E[X].

Topics