16. SPSS AMOS | Reporting Measurement Model (Part 2) | Reporting Reliability and Validity
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Re-estimate the CFA after removing indicators with standardized loadings below 0.50 to strengthen measurement quality.
Briefing
Reliability and validity reporting in a confirmatory factor analysis (CFA) hinges on a clear sequence: document measurement quality first (model fit and factor loadings), then report construct reliability and convergent validity, and finally demonstrate discriminant validity. The practical takeaway is that acceptable measurement models aren’t just about overall fit—they also require indicator loadings meeting a threshold and reliability/validity statistics that clear commonly used benchmarks.
After rebuilding the CFA model and re-running estimates in IBM SPSS AMOS, the workflow starts with standardized factor loadings. Indicators with weak loadings are treated as poor reflections of their latent construct. In the example, one item (LS5) shows a standardized regression weight of 0.463, which falls below the 0.50 cutoff. That indicator is deleted from the diagram, and the model is re-estimated to improve measurement quality.
With the revised model in place, the reporting focus shifts to construct reliability and convergent validity. The transcript emphasizes composite reliability as the reliability metric (rather than Cronbach’s alpha, which is mentioned but not used here). Composite reliability values are reported per construct and compared against a benchmark of 0.70. The example results range from 0.813 (authentic leadership) to 0.918 (ethical leadership), with life satisfaction at 0.891—each above 0.70—supporting the conclusion that construct reliability is established.
Convergent validity is then assessed using Average Variance Extracted (AVE), with a threshold of 0.50. AVE values are described as meeting the requirement for all constructs except authentic leadership. Even so, the transcript notes that authentic leadership still passes the reliability benchmark (composite reliability above 0.70), allowing a qualified conclusion: the construct can be argued as valid because the composite reliability indicates sufficient internal consistency and the latent construct explains a substantial portion of indicator variance.
Discriminant validity comes last, and the transcript contrasts two approaches. The Fornell–Larcker criterion is presented first: discriminant validity is supported when the square root of AVE for each construct exceeds its correlations with other constructs. However, Fornell–Larcker is also flagged as increasingly criticized in the literature. As an alternative, the Heterotrait–Monotrait ratio (HTMT) is used, with discriminant validity supported when HTMT ratios are below a limit of 0.85 (citing Henseler, 2015). In the example, all HTMT ratios remain under 0.85, leading to the conclusion that discriminant validity is established.
For writing up results, the transcript recommends copying AMOS output into a spreadsheet, then building clean tables for loadings, reliability (including composite reliability), convergent validity (AVE), and discriminant validity (Fornell–Larcker and HTMT). The overall order—measurement model, construct reliability, convergent validity, then discriminant validity—keeps reporting consistent and defensible.
Cornell Notes
The workflow for reporting a CFA measurement model’s quality starts with standardized factor loadings and model re-estimation. Indicators with loadings below 0.50 are removed; in the example, LS5 had a standardized loading of 0.463 and was deleted before re-running the model. Reliability is reported using composite reliability (benchmark ≥ 0.70), with values such as 0.813 for authentic leadership, 0.918 for ethical leadership, and 0.891 for life satisfaction. Convergent validity is assessed via AVE (benchmark ≥ 0.50); AVE met the threshold for all constructs except authentic leadership, which is handled in the write-up using the strong composite reliability. Discriminant validity is evaluated first with Fornell–Larcker and then more robustly with HTMT ratios (benchmark < 0.85), where all ratios in the example stayed below 0.85.
Why does the reporting process begin with factor loadings, and what threshold is used?
How is construct reliability reported in this workflow, and what benchmark is applied?
What statistic is used for convergent validity, and how is a partial AVE failure handled?
What is the Fornell–Larcker criterion for discriminant validity?
Why does the transcript also use HTMT, and what cutoff determines discriminant validity?
What table elements does the transcript recommend for reporting reliability and validity?
Review Questions
- What specific action is taken when a standardized factor loading falls below 0.50, and how does that affect subsequent reporting?
- How do composite reliability (CR) and AVE differ in what they validate, and what benchmarks are used for each?
- Under the HTMT approach, what numeric threshold indicates discriminant validity, and how does it relate to the Fornell–Larcker criterion?
Key Points
- 1
Re-estimate the CFA after removing indicators with standardized loadings below 0.50 to strengthen measurement quality.
- 2
Report construct reliability using composite reliability with a benchmark of 0.70 or higher for each construct.
- 3
Assess convergent validity using AVE with a benchmark of 0.50, and explicitly note any construct that falls short.
- 4
Use discriminant validity checks in a defensible order: Fornell–Larcker first, then HTMT as a more widely used alternative.
- 5
Apply the HTMT cutoff of 0.85 (Henseler, 2015) to conclude discriminant validity when all ratios are below the limit.
- 6
Present results in clean tables: loadings plus reliability/AVE for convergent validity, and Fornell–Larcker plus HTMT for discriminant validity.
- 7
Keep reporting organized by sequence: measurement model → construct reliability → convergent validity → discriminant validity.