01. SPSS Classroom Lectures| Basic Statistical Concepts (P1) | Reliability and Validity
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Reliability is consistency of results under similar conditions; validity is whether the instrument measures the intended construct.
Briefing
Reliability and validity are the two core quality checks behind any questionnaire or measurement tool—reliability for consistency, validity for accuracy. A measurement is reliable when it produces the same numeric results repeatedly under similar conditions with the same subjects. Validity, by contrast, requires that the instrument actually measures the concept it claims to measure. The distinction matters because consistency alone can be misleading: an instrument can give stable results while still targeting the wrong construct.
A classic reliability-versus-validity example illustrates the risk. A wall clock that always shows 6:00 when someone enters the room is consistent, but it is not necessarily valid for telling the correct time. Likewise, a test that repeatedly returns the same score for a child’s recall of last day activities may be reliable, yet someone might incorrectly claim it measures the child’s IQ. Another analogy compares constructs: using a job satisfaction scale to measure job commitment may yield consistent scores, but those scores could reflect something else entirely—such as memory or another unrelated trait—rather than the intended concept.
Reliability can be assessed through the test–retest approach: administer the same instrument to the same people twice under similar conditions, then correlate the two sets of results. Higher correlation indicates greater consistency. In practice, test–retest is difficult because repeated participation is hard to arrange, and subjects may no longer respond neutrally after the first exposure. The example of repeatedly taking a GMAT test highlights how familiarity can introduce bias, undermining the attempt to measure pure consistency.
Because test–retest is often impractical, researchers use other reliability techniques. For categorical data, Cohen’s Kappa coefficient is commonly used. For internal consistency across items in a scale, Cronbach’s Alpha is widely applied. In modern survey research, construct reliability is frequently assessed using composite reliability, which fits naturally within confirmatory factor analysis.
Validity is assessed by checking how accurately the measurement aligns with the underlying trait it is meant to represent. For job satisfaction, the underlying trait might be reflected through multiple indicators—salary, environment, co-workers, and job security—so validity asks whether those indicators truly capture job satisfaction rather than something adjacent. Face validity is the first check: the instrument should appear, to experts and respondents, to measure the intended concept. Beyond that, predictive validity tests whether the measure forecasts related outcomes (e.g., GMAT performance predicting MBA performance). Content validity checks whether the instrument covers the full intended domain of the construct; measuring only reading skills when the construct includes reading, writing, and listening would miss major parts of the domain.
Construct validity, grounded in theory, looks for expected patterns of relationships among variables. It includes convergent validity—items intended to measure the same construct should correlate strongly—and discriminant validity—different constructs should relate differently, not collapse into one another. Statistical tools such as average variance extracted and the Fornell–Larcker criterion (or related methods) are used to support these claims. Reliability and validity together determine whether a measurement is both stable and meaningful for research conclusions.
Cornell Notes
Reliability and validity are the two main quality standards for measurement instruments. Reliability means consistency: repeating the same test on the same subjects under similar conditions should yield similar results. Validity means accuracy: the instrument must measure the intended construct, not just produce stable numbers. Reliability is often assessed with test–retest correlations, though this can be hard due to non-neutral responses after repeated testing. Internal consistency measures like Cronbach’s Alpha and construct reliability via composite reliability are common alternatives. Validity is evaluated through face validity, predictive validity, content validity, and construct validity, including convergent and discriminant validity using theory-driven expected relationships and statistical checks.
Why can a measurement be reliable without being valid?
How does the test–retest method assess reliability, and why is it difficult in practice?
What do Cronbach’s Alpha and Cohen’s Kappa measure in reliability assessment?
How do face validity, predictive validity, and content validity differ?
What is the difference between convergent and discriminant validity within construct validity?
Review Questions
- Give one example of how an instrument could be reliable but not valid, and explain the difference between consistency and accuracy.
- Describe how test–retest reliability works and list two practical problems that can weaken it.
- Match each validity type (face, predictive, content, construct) to what it tests and provide a brief example for one of them.
Key Points
- 1
Reliability is consistency of results under similar conditions; validity is whether the instrument measures the intended construct.
- 2
Stable scores do not guarantee correctness—an instrument can be reliable while measuring the wrong trait.
- 3
Test–retest reliability uses two administrations and correlates results, but repeated testing can be impractical and can introduce bias.
- 4
Cohen’s Kappa supports reliability for categorical data, while Cronbach’s Alpha supports internal consistency across scale items.
- 5
Composite reliability is commonly used for construct reliability in survey research and aligns with confirmatory factor analysis.
- 6
Validity is assessed through face validity, predictive validity, content validity, and construct validity, including convergent and discriminant validity.
- 7
Construct validity relies on theory-driven expected relationships among variables and uses statistical checks such as average variance extracted and the Fornell–Larcker criterion.