Probability Theory 4 | Binomial Distribution [dark version]
Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
The binomial distribution gives the probability of exactly k successes in n independent trials with constant success probability p.
Briefing
Binomial distribution is the go-to probability model for counting how many “successes” (like heads) occur in a fixed number of independent trials, when each trial has the same success probability. The key takeaway is the probability of getting exactly k heads in n tosses of a biased coin: it follows the mass function . This matters because it turns a messy tree of outcomes into a compact formula that works for any discrete “counting” scenario—coin flips, urn draws, and many other experiments where order doesn’t matter.
The transcript builds the model from first principles. A single toss has two outcomes—heads with probability p and tails with probability 1−p. To connect this to a more physical setup, it introduces an urn model: an urn contains A balls representing heads and B balls representing tails, so p = A/(A+B). Drawing one ball at random gives the same two-outcome probability structure as the biased coin. For the binomial setting, the experiment repeats n times, and the focus shifts from which sequence occurs to only how many heads appear. That “unordered count” is what makes the binomial distribution different from tracking the full sequence.
To justify the formula, the transcript uses a coin-toss tree. Any specific path that yields exactly k heads has probability , because each heads contributes a factor of p and each tails contributes a factor of 1−p. But there are many distinct paths that produce exactly k heads. The number of such paths is , the count of ways to choose which k of the n trials are heads. Multiplying the probability of one path by the number of equivalent paths produces , the binomial probability mass function. The distribution is therefore determined entirely by two parameters: n (number of trials) and p (success probability per trial).
The latter half translates the theory into computation using R. The built-in binomial generator simulates the number of heads from n trials with success probability p. Examples show how changing n changes the histogram: with small n, outcomes are sparse and extreme values (0 or n heads) can appear, while larger n concentrates probability near the middle. The transcript then simulates the urn model directly: it creates an urn with A ones (heads) and B zeros (tails), samples with replacement n times, counts the number of ones via , and repeats the experiment M times using . Finally, it compares the histogram from the urn simulation to R’s output using the same effective p = A/(A+B), finding the distributions match closely.
Overall, the binomial distribution emerges as a precise bridge between a simple two-outcome probability (p and 1−p), a counting objective (exactly k successes), and practical simulation (R and urn sampling with replacement).
Cornell Notes
The binomial distribution models the number of successes (e.g., heads) in n independent trials with a constant success probability p. For exactly k successes, the probability is , where counts how many different trial orders produce the same k successes. The transcript connects p to an urn model: if an urn has A head-balls and B tail-balls, then p = A/(A+B). It then demonstrates simulation in R using and verifies the result by sampling from an urn with replacement and counting ones. As n increases, simulated histograms concentrate around the middle values, reflecting the combinatorics in the formula.
Why does the binomial probability include both and ?
How does the urn model reproduce the same p as a biased coin?
What assumptions make a situation “binomial”?
What does R’s compute, and how do its arguments map to n and p?
How does the urn simulation verify the binomial distribution?
Review Questions
- In the binomial formula , what does each factor represent combinatorially and probabilistically?
- How does increasing n (with the same p) change the shape of the distribution, and why does that happen?
- What role does “with replacement” play in making the urn model match the binomial assumptions?
Key Points
- 1
The binomial distribution gives the probability of exactly k successes in n independent trials with constant success probability p.
- 2
For exactly k heads, the probability is , where counts the number of head/tail orderings that lead to k heads.
- 3
The urn model connects p to composition: if an urn has A head-balls and B tail-balls, then p = A/(A+B).
- 4
Binomial modeling requires fixed n, identical p each trial, and independence—achieved in the urn model by drawing with replacement.
- 5
R’s simulates binomial counts directly using “size” for n and “prob” for p.
- 6
Simulating the urn directly (sampling with replacement, then counting ones) reproduces the same distribution as when p matches A/(A+B).
- 7
As n grows, simulated outcomes cluster more tightly around the middle rather than spreading evenly across extremes.