10Min Research - 34. Understanding and Performing Stratified Random Sampling in Social Sciences

TL;DR

Stratified random sampling splits a population into strata so each subgroup receives adequate representation in the final sample.

Briefing Cornell Notes

Briefing

Stratified random sampling is designed to prevent a common failure of simple random sampling: when a population contains distinct subgroups, random draws can over-represent large groups and under-represent—or entirely miss—small ones. In social science research, that matters because each subgroup may contribute different perspectives that affect conclusions. The core idea is to split the population into strata (singular: stratum)—for example, Bachelor, Master, and PhD students—and then run simple random sampling within each stratum so every group appears in the final sample.

A higher-education example makes the risk concrete. Suppose a university has 1,550 students total: 1,000 Bachelor students, 500 Master students, and 50 PhD students. If the study needs a sample of 300 people and sampling is done purely at random from the full list, the sample can easily end up dominated by Bachelor and Master students, with a real chance of including zero PhD students. That would be methodologically wrong for studies where PhD input is important for understanding higher-education practices and procedures.

To fix this, the population is divided into three strata, and sampling is performed separately inside each one. The first approach is proportionate stratified random sampling, where each stratum’s share of the sample matches its share of the population. With a sample size of 300 from a population of 1,550, the overall sampling fraction is 300/1,550 ≈ 19.35%. Applying that fraction to each stratum yields target counts of about 193.5 from each group if the same percentage were used directly in the calculation shown; the practical takeaway is that the sample size allocated to each stratum is computed from the stratum’s population size times the overall sampling fraction (expressed as a percentage).

Because exact targets can be awkward and because researchers want to ensure minimum representation, the method can be adjusted by increasing the number of people contacted in each stratum. The transcript illustrates this by moving from the strict proportionate targets to a larger set of contacted counts (e.g., contacting more Bachelor and Master students and a higher number of PhD students) to help guarantee that the final number of responses meets the minimum required.

A second approach is disproportionate stratified random sampling, where the allocation across strata is intentionally changed to secure stronger representation of smaller or more important groups. Instead of using the same proportional allocation, the researcher increases the PhD stratum share (and adjusts others downward) to avoid the “too few PhD responses” problem.

Operationally, each stratum becomes its own sampling frame. With a list of all Bachelor students (say 1,000 entries in an Excel sheet), the researcher uses a random number generator to select the required number of individuals (e.g., 300) by choosing random indices within the allowed range (minimum 1, maximum 1,000). The same process is repeated for Master and PhD lists. The method depends on having access to the full population frame for each stratum; without the ability to identify and select every element, neither simple random sampling nor stratified random sampling can be carried out properly.

Cornell Notes

Stratified random sampling prevents simple random sampling from over-representing large subgroups and missing small ones. The population is split into strata (e.g., Bachelor, Master, PhD), and simple random sampling is performed within each stratum so every group is represented in the final sample. Proportionate stratified sampling allocates sample sizes according to each stratum’s share of the population, using the overall sampling fraction (sample size divided by total population). Disproportionate stratified sampling changes those allocations to boost representation of smaller or higher-priority groups, often by contacting more people in underrepresented strata to ensure minimum response counts. The method requires a complete sampling frame (a list of all elements) for each stratum so random selection can be done reliably.

Why can simple random sampling fail when a population has subgroups?

If subgroups differ in size, a purely random draw can disproportionately select members from the largest subgroup and exclude smaller ones. In the higher-education example, drawing 300 students from 1,550 total without stratification can yield a sample with many Bachelor and Master students and potentially zero PhD students, even though PhD perspectives may be crucial for the study.

How does proportionate stratified random sampling determine how many people to sample from each stratum?

It uses the overall sampling fraction: sample size ÷ total population. In the example, 300 ÷ 1,550 ≈ 19.35%. That percentage is then applied to each stratum’s population size to compute the target number of respondents to obtain from Bachelor, Master, and PhD groups. The transcript emphasizes calculating the stratum allocation from the required sample size and the total population.

What’s the difference between proportionate and disproportionate stratified sampling?

Proportionate stratified sampling keeps each stratum’s sample share aligned with its population share. Disproportionate stratified sampling intentionally alters those proportions—often increasing the share allocated to smaller groups like PhD students and reducing it elsewhere—so the final sample contains adequate representation from every stratum.

Why might researchers increase the number of people contacted beyond the target sample size?

Because the required number is usually based on completed responses, not invitations. The transcript illustrates contacting more individuals in each stratum (e.g., increasing Bachelor and Master contacts and especially boosting PhD contacts) to improve the odds of meeting minimum response counts after nonresponse.

How is random selection carried out inside each stratum in practice?

Each stratum uses its own sampling frame (e.g., an Excel list of all Bachelor students). A random number generator selects the required number of indices within a defined range—minimum 1 and maximum equal to the stratum list size (e.g., 1 to 1,000 for 1,000 Bachelor entries). The selected individuals then receive questionnaires or emails, and the same procedure repeats for Master and PhD lists.

What requirement must be met for stratified random sampling to work?

Researchers need access to the population frame for each stratum—meaning a complete list of all elements in each subgroup. Without the ability to identify and select every element, random selection within strata cannot be performed correctly, undermining both simple random sampling and stratified random sampling.

Review Questions

If a study needs 300 responses from a population of 1,550, what sampling fraction is used for proportionate stratified allocation?
When would disproportionate stratified sampling be preferable to proportionate stratified sampling?
What information must exist before using a random number generator to select participants within a stratum?

Key Points

1
Stratified random sampling splits a population into strata so each subgroup receives adequate representation in the final sample.
2
Simple random sampling can miss small subgroups entirely, especially when subgroup sizes vary widely.
3
Proportionate stratified sampling allocates sample sizes using the overall sampling fraction (sample size ÷ total population).
4
Disproportionate stratified sampling intentionally changes stratum allocations to boost representation of smaller or more important groups.
5
Researchers often contact more people than the minimum required to account for nonresponse and still meet response targets.
6
Random selection within each stratum requires a complete sampling frame (a full list of elements) for that subgroup.

Highlights

The method’s main purpose is to ensure small subgroups—like PhD students—aren’t accidentally excluded when sampling from the whole population.

Proportionate stratified sampling uses a single sampling fraction (300/1,550 ≈ 19.35%) to compute stratum targets.

Disproportionate stratified sampling reallocates sample effort to increase PhD representation rather than following strict population proportions.

Within each stratum, selection can be implemented by generating random indices over the stratum’s full list (e.g., 1 to 1,000).

Topics

Stratified Random Sampling
Proportionate Allocation
Disproportionate Allocation
Sampling Frames
Random Number Generator