14. SEMinR Lecture Series. Discriminant Validity Assessment in R

TL;DR

Discriminant validity checks whether each reflective construct is empirically distinct from every other construct in the structural model.

Briefing Cornell Notes

Briefing

Discriminant validity is the make-or-break check for reflective measurement models: it tests whether each construct is empirically distinct from every other construct in the structural model. In practice, constructs must not only “work” internally, but also avoid measuring the same underlying concept as their neighbors. The lecture frames discriminant validity as the fourth step after indicator reliability, construct reliability, and convergent validity, and then walks through how to assess it in R using the seminR workflow.

A traditional approach comes from Fornell and Larcker, which compares the square root of a construct’s AVE (average variance extracted) against the correlations between that construct and all other constructs. The rule is straightforward: the square root of AVE for a construct should exceed its inter-construct correlations, implying that within-construct variance dominates shared variance. But the session warns that this criterion can miss problems—especially when indicator loadings across constructs differ only slightly (for example, when loadings fall in a narrow band like 0.65 to 0.85). Henseler and colleagues (2015) are cited for showing that the Fornell–Larcker criterion performs poorly in such cases, so it’s treated as unreliable and something to avoid even if it remains common.

As a more dependable alternative, the lecture emphasizes HTMT (heterotrait–monotrait ratio of correlations). The logic is built from indicator pairs: correlations between indicators measuring the same construct are “monotrait” and should be high, while correlations between indicators measuring different constructs are “heterotrait” and should be low. Discriminant validity problems show up when HTMT values are too large. Henseler et al. propose a threshold of 0.90 for constructs that are conceptually very similar (e.g., cognitive satisfaction, affective satisfaction, and loyalty), while a more conservative cutoff of 0.85 is recommended when constructs are more distinct.

The practical workflow in seminR starts by extracting HTMT results from a summary object (e.g., using the stored PLS estimation results). The lecture then demonstrates that HTMT values below 0.85 (or below 0.90 for highly similar constructs) support discriminant validity. It also notes a follow-up requirement: HTMT should be tested statistically against 1 (or against the chosen threshold) using bootstrap confidence intervals. That means running bootstrapping (the example uses 1,000 samples for speed, though 10,000 is recommended), extracting the bias-corrected confidence intervals, and checking whether the interval includes the critical value. If 1 falls outside the confidence interval, the null hypothesis that HTMT ≥ 1 is rejected, supporting discriminant validity.

Finally, the session adds cross-loading as another check. Each indicator should load highest on its own parent construct compared with all other constructs. The lecture illustrates this with an example where indicators for “vision” load strongly on vision rather than on development, rewards, collaborative culture, or organizational performance, and the same pattern holds for the other constructs.

The takeaway is a layered assessment strategy: rely on HTMT (with thresholds and bootstrap-based confidence intervals), use cross-loadings as a sanity check, and treat Fornell–Larcker as less trustworthy—particularly when constructs have similar indicator loading patterns. The session closes by recapping the broader reflective measurement evaluation steps and pointing ahead to formative model assessment next.

Cornell Notes

Discriminant validity determines whether reflective constructs are empirically distinct from one another. The lecture contrasts Fornell–Larcker with HTMT: Fornell–Larcker compares the square root of AVE to inter-construct correlations, but it can fail when indicator loadings across constructs are similar. HTMT (heterotrait–monotrait ratio) is presented as a stronger method, using correlations between indicators of different constructs (heterotrait) versus the same construct (monotrait). Discriminant validity is supported when HTMT is below a threshold—0.90 for very similar constructs and 0.85 for more distinct ones—and when bootstrap confidence intervals exclude the critical value (notably 1). Cross-loading offers an additional check: each indicator should load highest on its own construct.

Why does the Fornell–Larcker criterion sometimes miss discriminant validity problems?

Fornell–Larcker relies on comparing the square root of a construct’s AVE to inter-construct correlations. The lecture highlights evidence (Henseler et al., 2015) that this approach performs poorly when indicator loadings differ only slightly across constructs—such as when loadings cluster between about 0.65 and 0.85. In those cases, the criterion may not reliably flag when constructs are not truly distinct, so HTMT is preferred.

How does HTMT operationalize “construct distinctiveness”?

HTMT uses indicator-pair correlations. Correlations among indicators measuring the same construct are “monotrait” and should be relatively high. Correlations between indicators measuring different constructs are “heterotrait” and should be relatively low. Discriminant validity problems emerge when the heterotrait–monotrait ratio becomes too large, meaning indicators from different constructs correlate almost as strongly as indicators within the same construct.

What HTMT thresholds should be used, and when?

The lecture reports two cutoffs from Henseler et al. A more liberal threshold of 0.90 is recommended for constructs that are conceptually very similar (examples given include cognitive satisfaction, affective satisfaction, and loyalty). For constructs that are more conceptually distinct, a more conservative threshold of 0.85 is recommended. Values above the relevant cutoff suggest discriminant validity issues.

How do bootstrap confidence intervals strengthen the HTMT decision?

Beyond checking the raw HTMT value, the lecture recommends testing whether HTMT is significantly different from 1 (or from the chosen threshold). This requires bootstrapping to compute bias-corrected confidence intervals. The decision rule described: if the confidence interval does not include 1, the null hypothesis that HTMT ≥ 1 is rejected, supporting discriminant validity. If 1 lies within the interval, discriminant validity concerns remain.

What does cross-loading require for discriminant validity?

Cross-loading checks whether each indicator loads highest on its own parent construct. The lecture’s rule: indicator loadings should be high on their underlying construct and lower on all other constructs. If, for example, vision indicators load around 0.910 on vision and are lower on development, rewards, collaborative culture, and organizational performance, that pattern supports discriminant validity; the same logic applies to every construct’s indicators.

Review Questions

What are the conceptual differences between Fornell–Larcker and HTMT, and why does that matter when indicator loadings are similar?
Under what conditions would you use an HTMT threshold of 0.90 versus 0.85?
When using bootstrap confidence intervals for HTMT, what does it mean if the interval includes the value 1?

Key Points

1
Discriminant validity checks whether each reflective construct is empirically distinct from every other construct in the structural model.
2
Fornell–Larcker compares the square root of AVE to inter-construct correlations, but it can fail when indicator loadings across constructs differ only slightly.
3
HTMT (heterotrait–monotrait ratio) is preferred because it contrasts within-construct indicator correlations (monotrait) with between-construct indicator correlations (heterotrait).
4
Use HTMT thresholds of 0.90 for conceptually very similar constructs and 0.85 for more distinct constructs.
5
Support discriminant validity statistically by bootstrapping and using bias-corrected confidence intervals; exclude the critical value (notably 1) from the interval.
6
Cross-loading provides a practical check: each indicator should load highest on its own construct compared with all other constructs.
7
In seminR workflows, HTMT values are extracted from the PLS summary object, and confidence intervals come from a bootstrapped summary object.

Highlights

Fornell–Larcker can miss discriminant validity issues when indicator loadings across constructs are close (e.g., roughly 0.65–0.85), making HTMT the more reliable choice.

HTMT flags problems when heterotrait–monotrait ratios are too high: above 0.90 for very similar constructs or above 0.85 for more distinct ones.

Bootstrap confidence intervals for HTMT provide the statistical test: if 1 is not inside the interval, discriminant validity concerns are rejected.

Cross-loading should show a clean pattern where each indicator loads most strongly on its own parent construct, not on others.

Topics

Discriminant Validity
HTMT
Fornell–Larcker
Cross-Loading
seminR Bootstrapping

Mentioned

PLS
AVE
HTMT
FL
H1
H0