How to Determine Sample Size for Survey Research?
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Match sample size to the study’s population constraints; some target groups (e.g., top executives) limit feasible N.
Briefing
Sample size for survey research isn’t a single number—it’s a decision shaped by the study’s population, the statistical technique, and the assumptions behind common “rules.” The central takeaway is that researchers should match sample size to (1) what they’re trying to estimate and (2) what their analysis can reliably detect, rather than chasing bigger samples by default.
The transcript starts by warning that sample size constraints can be built into the research design itself. Studies targeting top executives (CEOs, CFOs, HR managers, directors, board members) can’t realistically reach the same respondent counts as surveys of large populations like thousands of firms. Likewise, the planned analysis method sets practical minimums: exploratory factor analysis generally needs at least 50 observations, while regression typically requires at least 50 and often around 100 for many research situations. Even the software ecosystem matters only in limited ways—there’s no “magic” in PLS-SEM or CB-SEM that makes small samples automatically valid. PLS-SEM is often mischaracterized as a small-sample shortcut, but the transcript emphasizes that software can run models with small N; the question is whether the results remain accurate given measurement quality and model complexity.
From there, the transcript lays out multiple guideline families—each with different assumptions. For factor-analytic designs, sample-to-item ratio rules are common: a 5:1 ratio is often recommended for EFA (e.g., 30 items → 150 respondents), while a 20:1 alternative is also cited (30 items → 600 respondents). For variable-based models, sample-to-variable ratio guidance suggests at least 5:1, with 15:1 or 20:1 preferred because 5:1 can be too low; the logic is that each independent variable needs enough observations to estimate effects.
A widely used alternative is the KMO and Bartlett-style approach via the Kie and Morgan table (often used with probability sampling). The transcript cautions that these tables only hold when the sampling assumptions—especially probability sampling—are met.
For structural equation modeling, the transcript distinguishes between “minimum to detect an effect” and “minimum to estimate model structure.” Using Daniel Soper’s online power analysis example, it shows a scenario where the minimum sample size to detect an effect is 376 (with effect size 0.2, power 0.8, five latent variables, 30 observed variables/items, and alpha 0.05). In contrast, a lower “minimum model structure” threshold of 200 is presented as accounting for added complexity like mediation (e.g., servant leadership → career commitment → life satisfaction). The point: power targets statistical detectability, while model-structure thresholds address estimation reliability.
The transcript then surveys additional heuristics and power tools. It references Roscoe-style guidance (30–500) and notes that very large samples can inflate statistical significance and increase the risk of Type I errors. For PLS-SEM, the classic 10-times rule is described (10× the largest number of formative indicators, or 10× the largest number of structural paths pointing to a construct), but it’s criticized as only appropriate under conditions like strong effect sizes and high measurement reliability. Alternatives include the inverse square root method (example minimum N of 160) and a gamma-exponential method (example minimum N of 146), plus power-table approaches using minimum R² targets (e.g., R² = 0.10 with six arrows → N ≈ 157). Finally, G*Power is presented for regression-based designs, where sample size depends on effect size, alpha, power, and the number of predictors (including interaction terms for moderation). The overall message: choose a method aligned with your model and analysis goals, then justify the resulting N with the assumptions behind the calculation approach.
Cornell Notes
Survey sample size decisions depend on more than a single “rule.” The transcript emphasizes matching N to the research population, the planned analysis (EFA, regression, CB-SEM, PLS-SEM), and the assumptions behind common heuristics. It contrasts power-based calculations (minimum N to detect a real effect) with estimation-based thresholds (minimum N to reliably estimate a model structure), illustrated using Daniel Soper’s structural equation modeling power tool. For PLS-SEM, it explains why the 10-times rule is limited and highlights alternatives such as inverse square root and gamma-exponential methods, plus power-table approaches using minimum R². G*Power is recommended for regression-style designs, including moderation via interaction terms.
Why can’t researchers simply use a large “default” sample size for every survey study?
How do analysis choices (EFA vs regression vs SEM) translate into minimum sample size expectations?
What’s the difference between “minimum sample size to detect an effect” and “minimum sample size for model structure” in SEM?
Why is the 10-times rule for PLS-SEM considered unreliable in some cases?
What alternatives to the 10-times rule are mentioned for PLS-SEM sample size planning?
How does G*Power support sample size calculation for regression and moderation models?
Review Questions
- Which sample size method best matches your goal: detect a hypothesized effect (power) or estimate a complex model reliably (model-structure threshold)?
- Under what conditions does the 10-times rule for PLS-SEM become more defensible, and when might it fail?
- If your survey has 30 items, how would sample-to-item ratio guidance differ between a 5:1 rule and a 20:1 rule, and what would that imply for feasibility?
Key Points
- 1
Match sample size to the study’s population constraints; some target groups (e.g., top executives) limit feasible N.
- 2
Tie minimum sample size expectations to the planned analysis method (EFA, regression, CB-SEM, PLS-SEM), not just to tradition.
- 3
Avoid assuming PLS-SEM or CB-SEM software can “fix” small samples; accuracy depends on measurement reliability and model conditions.
- 4
Use ratio-based heuristics for factor-analytic designs (e.g., sample-to-item 5:1 or 20:1) and variable-based ratios for regression-like structures (e.g., 15:1 or 20:1 per independent variable).
- 5
Separate power-based calculations (minimum N to detect effects) from estimation-based thresholds (minimum N to estimate model structure reliably).
- 6
Treat the 10-times rule for PLS-SEM as conditional; consider inverse square root, gamma-exponential, or power-table methods when assumptions may not hold.
- 7
Use G*Power for regression-based survey models, including moderation via interaction terms, by specifying effect size, alpha, power, and predictor count.