Probability Theory 21 | Conditional Expectation (given events) [dark version]

TL;DR

Conditional expectation E[X|B] is computed using the probability measure conditioned on event B, requiring P(B)>0.

Briefing Cornell Notes

Briefing

Conditional expectation given an event is built by reweighting probabilities to focus only on outcomes inside that event. Start with conditional probability: for events A and B with P(B) > 0, the conditional probability P(A|B) is the probability mass of A∩B scaled by 1/P(B). That scaling turns “looking at the whole sample space” into “looking only inside B,” producing a new probability measure. Once that conditional measure exists, the expectation of a random variable under the condition is just an ordinary expectation computed with respect to the updated measure—denoted E[X|B].

A key practical identity makes calculations easier: conditional expectation can be expressed using the original probability measure by inserting an indicator function. Since the conditional probability measure effectively multiplies probabilities by 1/P(B) and restricts to B, the conditional expectation becomes E[X|B] = (1/P(B)) · E[X · 1_B] = (1/P(B)) · E[1_B · X], where 1_B(ω) equals 1 if ω lies in B and 0 otherwise. This means there’s no need to rebuild the measure every time; one can keep the original measure P and instead change the integrand by multiplying by 1_B.

The transcript then works through a continuous example. Let X be normally distributed with mean 0 and variance 1 (standard normal). Choose the condition B = {ω : X(ω) > 0}, which keeps only the right half of the distribution. Using the conditional-expectation formula, the computation reduces to a one-dimensional integral over x > 0 of x times the standard normal density, scaled by 1/P(B). Symmetry of the normal distribution gives P(B) = 1/2, so the scaling factor is 2. The remaining integral evaluates to 1/√(2π), yielding E[X|X>0] = 1/√(2π). The result is positive even though the unconditional mean of X is 0, because conditioning removes all negative outcomes.

Next comes a general observation about indicator functions. When the random variable is itself an indicator 1_A, the conditional expectation E[1_A | B] corresponds to the conditional probability P(A|B). Intuitively, multiplying by 1_A and then conditioning on B counts the probability of landing in A while also landing in B.

Finally, a discrete example with a fair die makes the method concrete. Let X be the die outcome and condition on B = {5,6}. Then E[X|B] is the weighted average of 5 and 6 using their conditional probabilities. Since P(B)=2/6, the conditional expectation becomes (1/(2/6))·(5·(1/6)+6·(1/6)) = 11/2 = 5.5. The takeaway is straightforward: conditional expectation is the mean of X after restricting attention to the event B, implemented mathematically by conditioning the probability measure or equivalently by inserting an indicator function and scaling by 1/P(B).

Cornell Notes

Conditional expectation E[X|B] is the ordinary expectation of X computed under the probability measure that only counts outcomes in event B (assuming P(B)>0). Conditional probability P(A|B) = P(A∩B)/P(B) leads directly to a conditional measure, and the same idea applies to expectations. A practical formula avoids changing measures: E[X|B] = (1/P(B))·E[X·1_B], where 1_B is 1 on B and 0 outside. For a standard normal X, conditioning on X>0 yields E[X|X>0] = 1/√(2π). For a fair die with X∈{5,6} as the condition, E[X|B] becomes the average of 5 and 6, giving 5.5.

How does conditional probability turn “focus on B” into a new probability measure?

For events A and B with P(B)>0, conditional probability is P(A|B)=P(A∩B)/P(B). The division by P(B) rescales probabilities so that outcomes are effectively normalized within the subset B of the sample space. That rescaling defines a conditional probability measure, and any expectation computed under it becomes a conditional expectation.

Why is E[X|B] often computed as (1/P(B))·E[X·1_B] instead of rebuilding the conditional measure?

The indicator function 1_B(ω) equals 1 when ω∈B and 0 otherwise. Multiplying X by 1_B forces the integrand (or sum) to contribute only on B. The remaining factor 1/P(B) provides the normalization that conditional probability requires. So E[X|B] can be written using the original expectation E with the modified integrand X·1_B.

In the standard normal example, why does the scaling factor become 2?

The condition is B={X>0}. For a symmetric standard normal distribution, P(X>0)=1/2. Since E[X|B] includes the factor 1/P(B), the scaling becomes 1/(1/2)=2. This reflects that conditioning keeps only half the distribution and renormalizes it to total probability 1.

How does the conditional expectation E[X|X>0] evaluate to 1/√(2π)?

After applying the conditional-expectation formula, the computation reduces to an integral over x>0 of x times the standard normal density, multiplied by 2. The integral has an antiderivative that leads to the final value 1/√(2π). The result is positive because conditioning removes all negative x values.

How is conditional expectation computed for a fair die when conditioning on {5,6}?

Let X be the die outcome and B={5,6}. Then P(B)=2/6. The conditional expectation is the conditional weighted average: E[X|B]=(1/P(B))·(5·P(X=5)+6·P(X=6))=(1/(2/6))·(5·1/6+6·1/6)=11/2=5.5.

Review Questions

Given P(B)>0, write the relationship between E[X|B] and E[X·1_B].
For a symmetric distribution, what does conditioning on X>0 do to the sign of the mean compared with the unconditional mean?
For a discrete random variable, how do conditional expectations change the weights used in the average?

Key Points

1
Conditional expectation E[X|B] is computed using the probability measure conditioned on event B, requiring P(B)>0.
2
Conditional probability P(A|B)=P(A∩B)/P(B) motivates the conditional measure used for expectations.
3
A calculation shortcut is E[X|B]=(1/P(B))·E[X·1_B], where 1_B is the indicator of B.
4
Indicator functions effectively restrict integrals or sums to the conditioned region/event.
5
For a standard normal X, conditioning on X>0 yields E[X|X>0]=1/√(2π) due to symmetry and a one-sided integral.
6
For a fair die, conditioning on outcomes {5,6} produces E[X|B]=5.5 as the renormalized average of 5 and 6.

Highlights

Conditioning turns probability into a renormalized measure on B, and expectations follow the same rule.

The identity E[X|B]=(1/P(B))·E[X·1_B] lets calculations stay on the original probability space.

For a standard normal, conditioning on X>0 shifts the mean from 0 to 1/√(2π).

On a fair die, conditioning on {5,6} gives an average of 5.5 after renormalizing the probabilities.