Probability Theory 8 | Bayes's Theorem and Total Probability [dark version]
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Bayes’s theorem is obtained by combining two conditional-probability formulas that share the same intersection term P(A∩B).
Briefing
Bayes’s theorem and the law of total probability are presented as two linked tools for turning conditional information into an overall probability—then the Monty Hall problem is used to show how the machinery delivers the classic “switching wins with probability 2/3” result.
Bayes’s theorem is derived directly from conditional probability. Conditional probability of event A given B is written as P(A∩B)/P(B). Flipping the conditioning gives P(B|A)=P(A∩B)/P(A). Because the intersection A∩B is the same in both expressions, the two formulas can be combined to yield the standard Bayes form: P(A|B)·P(B)=P(B|A)·P(A). The key takeaway is the ability to swap the conditioning order by multiplying by the relevant event probability, and—when the target conditional probability needs to be isolated—dividing by the probability of the conditioning event.
The law of total probability then explains how to “split” a probability by partitioning the sample space. If the sample space Ω is decomposed into two disjoint parts, B and Bᶜ, then for any event A, P(A)=P(A∩B)+P(A∩Bᶜ). Each intersection term is rewritten using conditional probability: P(A∩B)=P(A|B)·P(B) and P(A∩Bᶜ)=P(A|Bᶜ)·P(Bᶜ). The same idea generalizes to countably many disjoint sets {B_i} whose union is Ω. With σ-additivity, the result becomes a series: P(A)=∑_i P(A|B_i)·P(B_i). This framework is what later supplies the denominator in Bayes’s theorem.
The Monty Hall problem is then framed in probability terms. There are three doors: one hides a car and two hide goats. A player initially picks a door (labeled door 1). The host then opens a different door showing a goat (the transcript focuses on the case where the host opens door 3). Events are defined as C_j = “car is behind door j” and S_j = “the host opens door j in the second step.” Under the condition that the car is behind door 3 (C_3), the host cannot open door 3 to show a goat, so P(S_3|C_3)=0. If the car is behind door 2 (C_2), the host has no choice but to open door 3, so P(S_3|C_2)=1. If the car is behind door 1 (C_1), the host has a choice between the two goat doors, making P(S_3|C_1)=1/2.
To find the probability of winning by switching, the analysis targets P(C_1|S_3) (switching from the initially chosen door 1 to the remaining unopened door). Bayes’s theorem requires dividing by P(S_3), which is not directly given. The law of total probability supplies it: P(S_3)=P(S_3|C_1)P(C_1)+P(S_3|C_2)P(C_2)+P(S_3|C_3)P(C_3). With a fair game assumption, each door has probability 1/3 for the car, so P(C_1)=P(C_2)=P(C_3)=1/3. Substituting the conditional probabilities yields P(C_1|S_3)=2/3, meaning switching gives a 2/3 chance to get the car. The point isn’t the car—it’s the demonstrated workflow: Bayes’s theorem for reversing conditioning, backed by the total probability formula for computing the missing normalization term.
Cornell Notes
Bayes’s theorem is derived from the definition of conditional probability: P(A|B)=P(A∩B)/P(B), and swapping conditioning gives P(B|A)=P(A∩B)/P(A). Multiplying by P(B) and P(A) shows the shared intersection term, enabling the conditioning order to be reversed. The law of total probability then computes an event’s probability by partitioning the sample space into disjoint cases {B_i}: P(A)=∑_i P(A|B_i)P(B_i). In the Monty Hall setup, the host’s action S_3 is used to update beliefs about where the car is (events C_1, C_2, C_3). Bayes’s theorem needs P(S_3), and total probability provides it, leading to a 2/3 win probability when switching.
How does Bayes’s theorem follow from the definition of conditional probability?
Why does the law of total probability require a disjoint partition of the sample space?
In Monty Hall, what are the conditional probabilities for the host opening door 3?
Where does the “missing” probability P(S_3) come from in Bayes’s theorem?
How does the fair-game assumption affect the final 2/3 result?
Review Questions
- Given P(A|B) and P(B|A), what additional probability term is needed to compute one from the other using Bayes’s theorem?
- How does the law of total probability change when the partition of Ω uses countably many sets {B_i} instead of just B and Bᶜ?
- In the Monty Hall setup, why is P(S_3|C_3)=0 and P(S_3|C_2)=1?
Key Points
- 1
Bayes’s theorem is obtained by combining two conditional-probability formulas that share the same intersection term P(A∩B).
- 2
Swapping conditioning requires multiplying by the conditioning event’s probability and then dividing by the probability of the new conditioning event.
- 3
The law of total probability computes P(A) by summing P(A|B_i)P(B_i) across a disjoint partition {B_i} of the sample space.
- 4
For two cases, the partition is B and Bᶜ, giving P(A)=P(A|B)P(B)+P(A|Bᶜ)P(Bᶜ).
- 5
For countably many cases, σ-additivity turns the same idea into a series: P(A)=∑_i P(A|B_i)P(B_i).
- 6
In Monty Hall, the host’s action S_3 determines conditional probabilities for the car locations C_1, C_2, and C_3: 1/2, 1, and 0 respectively.
- 7
Using Bayes’s theorem with total probability to compute P(S_3) yields a 2/3 chance of winning by switching.