LESSON 14 - THREATS TO INTERNAL AND EXTERNAL VALIDITY OF EXPERIMENTAL DESIGNS

TL;DR

Internal validity is the confidence that the treatment variable, not other factors, caused the observed change in the dependent variable.

Briefing Cornell Notes

Briefing

Experimental designs earn their power from one promise: they can support causal claims—“X causes Y”—only when threats to validity are controlled. Those threats are split into two buckets. Internal validity determines whether the observed change in the dependent variable is truly caused by the treatment variable, not by some other factor. External validity determines whether those causal findings can be generalized beyond the specific people, setting, and time period used in the study.

Internal validity is essentially the confidence that the treatment was the sole cause of the outcome. If a behavior change program improves youths’ behavior, internal validity asks whether the improvement came from the program itself or from other events happening at the same time. The lesson lists common internal threats and practical ways to reduce them. “History” refers to outside events occurring during the experiment—like counseling, exposure to related programs, or family discussions—that could also shift behavior. The fix is to ensure the experimental and control groups experience the same external events and to minimize extraneous variables.

“Testing” is another threat: repeated pre-testing can make participants more familiar with the measure, leading to better post-test performance. Researchers can use longer time gaps between pre- and post-tests, make measures as equivalent as possible, or drop pre-tests entirely via post-test-only designs. “Maturation” (sometimes described as “musculation/machination”) captures natural physical or psychological development during the study; it can be addressed by selecting participants who mature at similar rates and shortening the time between measurements. “Instrumentation” covers changes in measurement that alter scores; it’s handled by using the same instrument for pre- and post-tests.

The lesson also highlights design threats tied to who ends up in each group and who stays. “Selection bias” happens when groups differ at baseline (e.g., brighter students in the experimental group). Random selection and making groups as equivalent as possible reduce this risk. “Experimental mortality” occurs when participants drop out, creating imbalance; recruiting larger samples helps absorb dropout effects. “Statistical regression” is triggered when extreme scores are selected; avoiding extreme entry characteristics helps. Communication between groups threatens causal inference through “division of treatment” or interaction among variables, so groups should be kept separate and blinded to group status.

Motivation and fairness threats—“compensation/resentive demoralization” and “compensatory rivalry”—arise when only the experimental group receives benefits. Providing the control group with a placebo-like alternative or offering the real treatment after the experiment can reduce resentment. “Double blinding” (ensuring neither participants nor researchers know group assignment) supports both internal and external validity.

External validity asks whether results hold in other settings, for other people, and at other times. Threats include selection-related interaction effects, setting-by-treatment interactions, and history-by-treatment interactions that make results time-bound. Reactive effects of testing can also limit generalization if pre-testing alerts participants to the treatment. Researcher expectations can distort outcomes through “experimental effects” (predomarium effect), and “reactive effects of experimental arrangement” (the Hawthorne effect) occurs when participants behave artificially after realizing their group status.

To minimize external threats, the lesson emphasizes randomization, double blindness, researcher neutrality, spacing out multiple treatments, and using procedural control groups that receive something comparable to the experimental group so neither group feels disadvantaged. The takeaway is straightforward: causal confidence requires internal validity, and meaningful usefulness requires external validity—both must be protected for experimental findings to be credible and transferable.

Cornell Notes

Internal validity measures whether a study can credibly claim that the treatment variable caused the observed change in the outcome variable. It is threatened by factors like history, testing effects, maturation, instrumentation differences, selection bias, dropout (experimental mortality), regression to the mean, and contamination or communication between groups. External validity measures whether findings generalize to other people, settings, and time periods; it is threatened by selection-by-treatment and setting-by-treatment interactions, history-by-treatment time limits, reactive testing effects, researcher expectation effects, and the Hawthorne effect. Minimizing these threats relies on randomization, double blindness, neutral administration, appropriate timing, and giving control groups comparable experiences (placebo/procedural controls).

What is internal validity, and how does it connect to causal claims in experiments?

Internal validity is the degree to which a study establishes a trustworthy cause-and-effect relationship between the independent variable (treatment variable) and the dependent variable (outcome). It reflects confidence that the treatment was the sole reason for the observed change. If other explanations—like simultaneous events or measurement artifacts—could plausibly account for the outcome, internal validity is weakened.

How do “history” and “testing” threaten internal validity, and what are the corresponding fixes?

History refers to outside events occurring during the experiment that can also change the dependent variable (e.g., counseling, exposure to behavior-change programs, or family discussions happening alongside the treatment). The fix is to make the experimental and control groups experience the same external events and to control extraneous variables. Testing refers to pre-tests making participants familiar with the measure, which can improve later scores; fixes include using a longer interval between pre- and post-tests, ensuring equivalence of measures, or using post-test-only designs to eliminate pre-testing.

Why do maturation and instrumentation matter for internal validity?

Maturation (development over time) can change participants physically or psychologically during the study, producing outcome changes unrelated to the treatment. Researchers can select participants who mature at similar rates and reduce the time gap between measurements. Instrumentation is about changes in measurement across pre- and post-tests; it threatens comparability of scores, so the same instrument should be used for both.

What threats to internal validity come from group composition and participant dropout?

Selection bias occurs when groups differ at baseline (for example, selecting brighter students for the experimental group and less bright students for the control), which can distort treatment effects; random selection and making groups equivalent help. Experimental mortality happens when participants drop out during the experiment, creating differences between groups; recruiting a larger sample helps offset dropout effects. Statistical regression is related to selecting participants with extreme scores; avoiding extreme entry characteristics reduces this threat.

What makes external validity harder than internal validity, and which threats limit generalization?

External validity asks whether results apply beyond the specific experimental context—other people, settings, and time periods. It is limited by interaction effects such as selection-by-treatment (nonrandom selection/assignment), setting-by-treatment (results tied to a particular environment), and history-by-treatment (effects tied to a specific time period). Reactive effects of testing can also limit generalization if pre-testing alerts participants. Researcher expectation effects (predomarium effect) and the Hawthorne effect (reactive effects of experimental arrangement when participants know their group) can further distort outcomes, making them less transferable.

Which strategies reduce both internal and external validity threats?

Randomization supports equivalence between groups. Double blindness reduces behavior changes driven by knowing group assignment. Neutral administration helps prevent expectation-driven manipulation. Timing matters: shortening or spacing intervals can reduce maturation and testing reactivity. Procedural control groups—giving the control group something comparable to the experimental group—reduce resentment and improve fairness, which supports more stable outcomes.

Review Questions

How would you distinguish internal validity from external validity in an experiment claiming that a treatment causes an outcome?
List at least four internal validity threats and match each with a practical mitigation strategy.
What external validity threats would you check before claiming results apply to a different population or setting?

Key Points

1
Internal validity is the confidence that the treatment variable, not other factors, caused the observed change in the dependent variable.
2
History threatens internal validity when outside events occur during the experiment; matching experiences across experimental and control groups helps reduce it.
3
Testing effects can inflate post-test performance; post-test-only designs or longer intervals between tests can mitigate this risk.
4
Selection bias, dropout (experimental mortality), and regression to the mean can all distort group comparability; random selection/assignment and appropriate sampling strategies help.
5
Contamination between groups (communication or shared experiences) undermines causal inference; keeping groups separate and using blinding reduces this threat.
6
External validity depends on generalizability across people, settings, and time; interaction effects (selection-by-treatment, setting-by-treatment, history-by-treatment) limit transferability.
7
Double blindness, neutrality, procedural control groups, and careful timing are key tools for minimizing both internal and external validity threats.

Highlights

Internal validity is about whether the treatment is the sole cause of change; if simultaneous events or measurement artifacts could explain the outcome, causal confidence drops.

Testing can change participants’ behavior simply by making them familiar with the measure; post-test-only designs are one direct countermeasure.

External validity fails when results are tied to a specific context—nonrandom selection, particular settings, or time-bound history effects can all restrict generalization.

The Hawthorne effect and researcher expectation effects can produce artificial behavior, making outcomes less likely to replicate elsewhere.

Fairness and perceived benefit matter: resentful demoralization and compensatory rivalry can be reduced with placebo/procedural controls or delayed treatment for the control group.

Topics

Internal Validity
External Validity
Threats to Validity
Experimental Design
Double Blindness

Mentioned

Lydiah Wambugu