Probability Theory 21 | Conditional Expectation (given events) [dark version]
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Conditional expectation E[X|B] is computed using the probability measure conditioned on event B, requiring P(B)>0.
Briefing
Conditional expectation given an event is built by reweighting probabilities to focus only on outcomes inside that event. Start with conditional probability: for events A and B with P(B) > 0, the conditional probability P(A|B) is the probability mass of A∩B scaled by 1/P(B). That scaling turns “looking at the whole sample space” into “looking only inside B,” producing a new probability measure. Once that conditional measure exists, the expectation of a random variable under the condition is just an ordinary expectation computed with respect to the updated measure—denoted E[X|B].
A key practical identity makes calculations easier: conditional expectation can be expressed using the original probability measure by inserting an indicator function. Since the conditional probability measure effectively multiplies probabilities by 1/P(B) and restricts to B, the conditional expectation becomes E[X|B] = (1/P(B)) · E[X · 1_B] = (1/P(B)) · E[1_B · X], where 1_B(ω) equals 1 if ω lies in B and 0 otherwise. This means there’s no need to rebuild the measure every time; one can keep the original measure P and instead change the integrand by multiplying by 1_B.
The transcript then works through a continuous example. Let X be normally distributed with mean 0 and variance 1 (standard normal). Choose the condition B = {ω : X(ω) > 0}, which keeps only the right half of the distribution. Using the conditional-expectation formula, the computation reduces to a one-dimensional integral over x > 0 of x times the standard normal density, scaled by 1/P(B). Symmetry of the normal distribution gives P(B) = 1/2, so the scaling factor is 2. The remaining integral evaluates to 1/√(2π), yielding E[X|X>0] = 1/√(2π). The result is positive even though the unconditional mean of X is 0, because conditioning removes all negative outcomes.
Next comes a general observation about indicator functions. When the random variable is itself an indicator 1_A, the conditional expectation E[1_A | B] corresponds to the conditional probability P(A|B). Intuitively, multiplying by 1_A and then conditioning on B counts the probability of landing in A while also landing in B.
Finally, a discrete example with a fair die makes the method concrete. Let X be the die outcome and condition on B = {5,6}. Then E[X|B] is the weighted average of 5 and 6 using their conditional probabilities. Since P(B)=2/6, the conditional expectation becomes (1/(2/6))·(5·(1/6)+6·(1/6)) = 11/2 = 5.5. The takeaway is straightforward: conditional expectation is the mean of X after restricting attention to the event B, implemented mathematically by conditioning the probability measure or equivalently by inserting an indicator function and scaling by 1/P(B).
Cornell Notes
Conditional expectation E[X|B] is the ordinary expectation of X computed under the probability measure that only counts outcomes in event B (assuming P(B)>0). Conditional probability P(A|B) = P(A∩B)/P(B) leads directly to a conditional measure, and the same idea applies to expectations. A practical formula avoids changing measures: E[X|B] = (1/P(B))·E[X·1_B], where 1_B is 1 on B and 0 outside. For a standard normal X, conditioning on X>0 yields E[X|X>0] = 1/√(2π). For a fair die with X∈{5,6} as the condition, E[X|B] becomes the average of 5 and 6, giving 5.5.
How does conditional probability turn “focus on B” into a new probability measure?
Why is E[X|B] often computed as (1/P(B))·E[X·1_B] instead of rebuilding the conditional measure?
In the standard normal example, why does the scaling factor become 2?
How does the conditional expectation E[X|X>0] evaluate to 1/√(2π)?
How is conditional expectation computed for a fair die when conditioning on {5,6}?
Review Questions
- Given P(B)>0, write the relationship between E[X|B] and E[X·1_B].
- For a symmetric distribution, what does conditioning on X>0 do to the sign of the mean compared with the unconditional mean?
- For a discrete random variable, how do conditional expectations change the weights used in the average?
Key Points
- 1
Conditional expectation E[X|B] is computed using the probability measure conditioned on event B, requiring P(B)>0.
- 2
Conditional probability P(A|B)=P(A∩B)/P(B) motivates the conditional measure used for expectations.
- 3
A calculation shortcut is E[X|B]=(1/P(B))·E[X·1_B], where 1_B is the indicator of B.
- 4
Indicator functions effectively restrict integrals or sums to the conditioned region/event.
- 5
For a standard normal X, conditioning on X>0 yields E[X|X>0]=1/√(2π) due to symmetry and a one-sided integral.
- 6
For a fair die, conditioning on outcomes {5,6} produces E[X|B]=5.5 as the renormalized average of 5 and 6.