Get AI summaries of any video or article — Sign up free
Probability Theory 21 | Conditional Expectation (given events) thumbnail

Probability Theory 21 | Conditional Expectation (given events)

4 min read

Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Conditional probability renormalizes probabilities inside an event B using P(A | B) = P(A ∩ B)/P(B), requiring P(B) > 0.

Briefing

Conditional expectation given an event is built by reweighting probabilities so that only outcomes inside the conditioning event matter. If an event B has nonzero probability, the conditional probability measure rescales probabilities by the ratio P(A ∩ B)/P(B), turning “probability inside B” into a proper probability law. Expectations then follow the same pattern: the conditional expectation of a random variable X given B is just the ordinary expectation computed under this new, renormalized probability measure.

A key practical takeaway is that conditional expectation can be computed without changing the measure explicitly. Using indicator functions, the conditional expectation E[X | B] can be rewritten as a scaling factor 1/P(B) times an ordinary expectation of X multiplied by the indicator 1_B (which equals 1 on B and 0 outside). Concretely, this turns the problem into integrating (or summing) X over the region where B holds, with the original probability measure providing the weighting. This “indicator = restrict the domain” viewpoint is the bridge between abstract definitions and hands-on calculations.

In the continuous example, X is standard normal with density f(x) = (1/√(2π)) e^{-x^2/2}. The conditioning event is B = {X > 0}, which selects only the right half of the distribution. Because a standard normal is symmetric, P(B) = 1/2, so the scaling factor becomes 2. The conditional expectation reduces to an integral of x times the normal density over (0, ∞). Evaluating the resulting expression yields E[X | X > 0] = 2/√(2π). The result is positive—exactly what intuition predicts since only positive outcomes are allowed.

The transcript also highlights a general identity: when the random variable is an indicator 1_A, the conditional expectation E[1_A | B] matches the conditional probability P(A | B). In other words, conditional probability can be expressed as a conditional expectation, reinforcing that both concepts are the same renormalization idea viewed through different lenses.

Finally, the discrete example uses a fair die. Let X be the face value, and condition on B = {5, 6}. Under this restriction, the conditional distribution assigns probability 1/2 to each of 5 and 6. The conditional expectation becomes (5·(1/2) + 6·(1/2)) = 11/2 = 5.5. The calculation makes the meaning concrete: conditional expectation is simply the average of X after discarding all outcomes outside B and renormalizing the remaining probabilities.

Overall, conditional expectation given an event is a systematic way to compute averages under “knowledge that B occurred,” implemented by renormalizing probabilities and, in practice, by integrating or summing X over the event region using indicator functions.

Cornell Notes

Conditional expectation given an event B is the ordinary expectation computed under the renormalized probability measure that only counts outcomes in B. When P(B) > 0, the conditional probability is P(A | B) = P(A ∩ B)/P(B), and the conditional expectation E[X | B] is defined as the expectation of X with respect to that conditional probability measure. A practical formula rewrites it using an indicator: E[X | B] = (1/P(B)) · E[X · 1_B], so the indicator effectively restricts the domain of integration/summation to where B holds. Examples show the method works for both continuous and discrete cases: for a standard normal conditioned on X > 0, the result is 2/√(2π); for a die conditioned on {5,6}, the result is 5.5.

How does conditional probability lead to conditional expectation?

Conditional probability renormalizes probabilities inside an event B: P(A | B) = P(A ∩ B)/P(B) (valid only when P(B) ≠ 0). Since expectation is an integral (or sum) with respect to a probability measure, the conditional expectation E[X | B] is defined as the ordinary expectation of X but computed under this new conditional probability measure. The random variable X itself stays the same; only the probability weights change.

Why does the indicator function 1_B matter for calculations?

The indicator 1_B equals 1 on outcomes in B and 0 outside. Multiplying X by 1_B forces the contribution to come only from outcomes where B holds. The conditional expectation can be written as E[X | B] = (1/P(B)) · E[X · 1_B], which means one can keep the original probability measure and simply change the integrand (or summand) to include 1_B. This is the “restrict the domain” mechanism.

What is the continuous example result, and how is it obtained?

For X ~ Normal(0,1), condition on B = {X > 0}. Symmetry gives P(B) = 1/2, so the scaling factor is 2. Then E[X | X > 0] becomes an integral over (0, ∞) of x times the normal density. Evaluating the integral yields E[X | X > 0] = 2/√(2π), a positive value because only positive outcomes are allowed.

How does the discrete die example work under conditioning?

Let X be the die face value and condition on B = {5,6}. Under this condition, only two outcomes remain, each with conditional probability 1/2. The conditional expectation is E[X | B] = 5·(1/2) + 6·(1/2) = 11/2 = 5.5. It’s the average of the remaining values after renormalization.

What special relationship connects conditional probability and conditional expectation?

When the random variable is an indicator 1_A, the conditional expectation matches conditional probability: E[1_A | B] = P(A | B). This follows because integrating (or summing) 1_A over the conditional measure counts exactly the probability mass of A within B.

Review Questions

  1. Given an event B with P(B) > 0, write the formula for E[X | B] using an indicator function.
  2. For a symmetric continuous distribution, what does conditioning on {X > 0} do to P(B) and the scaling factor in E[X | B]?
  3. In the die example conditioned on {5,6}, what conditional probabilities are assigned to 5 and 6, and how do they determine E[X | B]?

Key Points

  1. 1

    Conditional probability renormalizes probabilities inside an event B using P(A | B) = P(A ∩ B)/P(B), requiring P(B) > 0.

  2. 2

    Conditional expectation E[X | B] is the ordinary expectation of X computed under the conditional probability measure.

  3. 3

    A computation shortcut is E[X | B] = (1/P(B)) · E[X · 1_B], where the indicator 1_B restricts contributions to outcomes in B.

  4. 4

    For continuous variables, conditioning turns the expectation integral into one over the region defined by B (with the same original density).

  5. 5

    For discrete variables, conditioning turns the expectation into a weighted average over only the outcomes in B, with probabilities renormalized.

  6. 6

    Conditional probability is a special case of conditional expectation: E[1_A | B] = P(A | B).

Highlights

Conditional expectation given B is just expectation under a probability law that has been renormalized to live entirely on B.
Using 1_B turns “condition on B” into “integrate/sum only where B holds,” multiplied by 1/P(B).
For X ~ Normal(0,1), conditioning on X > 0 yields E[X | X > 0] = 2/√(2π).
For a fair die conditioned on {5,6}, the conditional expectation is 5.5 because only two outcomes remain with equal probability.
Conditional probability and conditional expectation are linked directly through indicators: E[1_A | B] = P(A | B).

Topics