Get AI summaries of any video or article — Sign up free
Probability Theory 4 | Binomial Distribution [dark version] thumbnail

Probability Theory 4 | Binomial Distribution [dark version]

5 min read

Based on The Bright Side of Mathematics's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

The binomial distribution gives the probability of exactly k successes in n independent trials with constant success probability p.

Briefing

Binomial distribution is the go-to probability model for counting how many “successes” (like heads) occur in a fixed number of independent trials, when each trial has the same success probability. The key takeaway is the probability of getting exactly k heads in n tosses of a biased coin: it follows the mass function . This matters because it turns a messy tree of outcomes into a compact formula that works for any discrete “counting” scenario—coin flips, urn draws, and many other experiments where order doesn’t matter.

The transcript builds the model from first principles. A single toss has two outcomes—heads with probability p and tails with probability 1−p. To connect this to a more physical setup, it introduces an urn model: an urn contains A balls representing heads and B balls representing tails, so p = A/(A+B). Drawing one ball at random gives the same two-outcome probability structure as the biased coin. For the binomial setting, the experiment repeats n times, and the focus shifts from which sequence occurs to only how many heads appear. That “unordered count” is what makes the binomial distribution different from tracking the full sequence.

To justify the formula, the transcript uses a coin-toss tree. Any specific path that yields exactly k heads has probability , because each heads contributes a factor of p and each tails contributes a factor of 1−p. But there are many distinct paths that produce exactly k heads. The number of such paths is , the count of ways to choose which k of the n trials are heads. Multiplying the probability of one path by the number of equivalent paths produces , the binomial probability mass function. The distribution is therefore determined entirely by two parameters: n (number of trials) and p (success probability per trial).

The latter half translates the theory into computation using R. The built-in binomial generator simulates the number of heads from n trials with success probability p. Examples show how changing n changes the histogram: with small n, outcomes are sparse and extreme values (0 or n heads) can appear, while larger n concentrates probability near the middle. The transcript then simulates the urn model directly: it creates an urn with A ones (heads) and B zeros (tails), samples with replacement n times, counts the number of ones via , and repeats the experiment M times using . Finally, it compares the histogram from the urn simulation to R’s output using the same effective p = A/(A+B), finding the distributions match closely.

Overall, the binomial distribution emerges as a precise bridge between a simple two-outcome probability (p and 1−p), a counting objective (exactly k successes), and practical simulation (R and urn sampling with replacement).

Cornell Notes

The binomial distribution models the number of successes (e.g., heads) in n independent trials with a constant success probability p. For exactly k successes, the probability is , where counts how many different trial orders produce the same k successes. The transcript connects p to an urn model: if an urn has A head-balls and B tail-balls, then p = A/(A+B). It then demonstrates simulation in R using and verifies the result by sampling from an urn with replacement and counting ones. As n increases, simulated histograms concentrate around the middle values, reflecting the combinatorics in the formula.

Why does the binomial probability include both and ?

Each specific sequence with exactly k heads has probability : every head contributes a factor of p and every tail contributes a factor of 1−p. But many different sequences yield exactly k heads. The factor counts the number of ways to choose which k of the n trials are heads. Multiplying gives the total probability for “exactly k heads” regardless of order.

How does the urn model reproduce the same p as a biased coin?

In the urn setup, the urn contains A balls representing heads and B balls representing tails. A random draw yields heads with probability A/(A+B). That ratio is defined as p, matching the biased coin’s success probability. Drawing with replacement keeps the probability constant across trials, aligning with the binomial assumptions.

What assumptions make a situation “binomial”?

The transcript highlights three: a fixed number of trials n, identical success probability p on each trial, and independence achieved via replacement (or otherwise ensuring the probability doesn’t change). It also stresses that the outcome of interest is the count of successes k, not the order of heads and tails.

What does R’s compute, and how do its arguments map to n and p?

R’s generates simulated counts of successes from a binomial distribution. The arguments correspond to the number of trials (n, called “size” in R) and the success probability per trial (p). The output is the number of heads (k) in each simulated experiment.

How does the urn simulation verify the binomial distribution?

The transcript builds an urn array with ones for heads and zeros for tails, with counts proportional to A and B. It samples n draws with replacement, counts the number of ones using , and repeats the process M times with . The resulting histogram is compared to using p = A/(A+B), producing a similar distribution.

Review Questions

  1. In the binomial formula , what does each factor represent combinatorially and probabilistically?
  2. How does increasing n (with the same p) change the shape of the distribution, and why does that happen?
  3. What role does “with replacement” play in making the urn model match the binomial assumptions?

Key Points

  1. 1

    The binomial distribution gives the probability of exactly k successes in n independent trials with constant success probability p.

  2. 2

    For exactly k heads, the probability is , where counts the number of head/tail orderings that lead to k heads.

  3. 3

    The urn model connects p to composition: if an urn has A head-balls and B tail-balls, then p = A/(A+B).

  4. 4

    Binomial modeling requires fixed n, identical p each trial, and independence—achieved in the urn model by drawing with replacement.

  5. 5

    R’s simulates binomial counts directly using “size” for n and “prob” for p.

  6. 6

    Simulating the urn directly (sampling with replacement, then counting ones) reproduces the same distribution as when p matches A/(A+B).

  7. 7

    As n grows, simulated outcomes cluster more tightly around the middle rather than spreading evenly across extremes.

Highlights

Exactly k heads in n biased coin tosses has probability : for one order, for all orders.
The urn model makes p concrete: with A head-balls and B tail-balls, p = A/(A+B).
R simulation confirms theory: urn sampling with replacement and counting ones matches when p is set to A/(A+B).
Increasing n sharpens the distribution around central values, making extreme counts (near 0 or n) less likely.

Topics