30. SEMinR Lecture Series - How to Solve Convergent and Discriminant Validity Issues

TL;DR

Use at least four to six items per construct in reflective models, since SEMinR may delete indicators that fail to load or cross-load.

Briefing Cornell Notes

Briefing

Convergent and discriminant validity problems in reflective measurement models can often be fixed inside SEMinR by tightening the measurement first—then cleaning the data and re-checking HTMT—rather than treating validity as a one-shot diagnostic. The core workflow starts before analysis: each construct should have enough indicators (at least four to six), items must be easy to understand, and statements should not overlap conceptually, because overlapping wording is a common driver of discriminant validity failures.

For convergent validity, the practical lever is outer loadings and the downstream reliability/AVE metrics. Indicators with weak factor loadings are candidates for removal, but the decision should be tied to improvement in composite reliability and average variance extracted (AVE), not loading thresholds alone. While loadings above 0.70 are often treated as desirable, social science datasets frequently produce weaker outer loadings; SEMinR guidance in the transcript emphasizes caution. Items should be deleted only when doing so meaningfully increases composite reliability and AVE (with AVE expected to exceed recommended cutoffs). If reliability and AVE are already above targets, items with loadings below 0.70 may still be retained as long as they remain reasonably strong (the transcript notes a practical range where loadings above 0.40 need not be removed if overall validity metrics are already acceptable). In R, the process is implemented by inspecting the SEMinR summary outputs for reliability, AVE, and the specific indicators with low outer loadings, then iterating the model after deletions.

Discriminant validity is then addressed through HTMT, cross-loadings, and—when needed—data quality checks. HTMT values above the required limit signal trouble between specific construct pairs (the transcript gives an example where four constructs show issues, with LG and SC among the problematic pairings). When HTMT flags a problem, one recommended diagnostic is to compute response standard deviations per construct and remove respondents with extremely low variability (the transcript uses a threshold of standard deviation below 0.25). The rationale is straightforward: near-zero standard deviation suggests respondents did not read or did not answer properly. After deleting those problematic cases, HTMT is re-run; the transcript reports that validity metrics improve significantly.

Next comes cross-loading review. If an item cross-loads on two constructs and the difference between its primary loading and the competing loading is less than 0.10, that indicator is a candidate for removal. The transcript illustrates this by comparing cross-loadings between specific constructs (e.g., LG vs RC, and LG vs RC again through different indicator pairs), using spreadsheet comparisons to identify where the loading differences fall below the 0.10 guideline.

If HTMT still fails, the transcript outlines escalation steps. Bootstrapped HTMT with bias-corrected confidence intervals can confirm whether discriminant validity is supported (the key check is whether a “1” appears between the interval bounds). If discriminant validity issues persist even after confidence-interval checks, one fallback is to collapse theoretically distinct dimensions into a single higher-order measure when correlations in the 0.8–0.9 range are repeatedly reported in the literature. If none of the remedies work, additional data collection may be necessary. Finally, the transcript notes that sampling quirks can drive multicollinearity-like behavior; dropping highly colinear independent variables may help when discriminant validity problems are tied to model structure rather than measurement quality.

Cornell Notes

The transcript lays out a step-by-step way to fix convergent and discriminant validity problems in reflective measurement models using SEMinR. Convergent validity is improved by checking outer loadings and only deleting indicators when doing so increases composite reliability and AVE, not just because a loading falls below 0.70. Discriminant validity is assessed with HTMT; when HTMT is too high for specific construct pairs, the workflow recommends cleaning the data by removing respondents with very low response standard deviation (below 0.25), then re-running HTMT. If issues remain, cross-loadings are inspected and items with less than a 0.10 loading difference between constructs are removed. When problems persist, bootstrapped HTMT confidence intervals guide whether discriminant validity is truly unsupported; collapsing highly correlated dimensions or collecting more data may be the last resort.

What should trigger indicator deletion for convergent validity, and what thresholds matter?

Deletion should be tied to improvement in composite reliability and average variance extracted (AVE), not to outer loading thresholds alone. Although loadings above 0.70 are often treated as desirable, the transcript warns that social science studies frequently yield weaker outer loadings. A practical rule given is to consider removing indicators with outer loadings below 0.70 only if the deletion increases composite reliability and AVE beyond recommended values. If reliability and AVE are already above targets, indicators with loadings below 0.70 may still be kept as long as they are not too weak (the transcript notes loadings above 0.40 as a case where removal may not be necessary). Content validity also matters: removing an item can harm the construct’s coverage.

How does the workflow use HTMT to locate discriminant validity problems?

HTMT is run for construct pairs, and values above the required limit indicate discriminant validity trouble. The transcript’s example shows multiple constructs with HTMT values exceeding the threshold, highlighting specific problematic pairs such as LG with SC (and later SC with HC). Once HTMT flags the pair(s), the workflow targets those constructs for further checks—starting with data quality and then moving to cross-loadings and item-level decisions.

Why remove respondents with low standard deviation, and what threshold is used?

Very low standard deviation within a construct suggests respondents did not read or did not answer the items properly, which can distort HTMT and inflate apparent overlap between constructs. The transcript recommends removing responses (respondents/cases) where the standard deviation is below 0.25. It describes using the dataset’s CSV/Excel to compute standard deviation per construct, sorting from smallest to largest, deleting cases with near-zero standard deviation, and then re-running the SEM model and HTMT to confirm improvement.

What cross-loading rule determines whether an item should be removed for discriminant validity?

Cross-loading review focuses on the difference between an item’s primary loading and its competing loading on another construct. If an item cross-loads and the difference is less than 0.10, the item (or items) is a candidate for removal. The transcript illustrates this by comparing cross-loadings between specific construct pairs (e.g., LG vs RC) and identifying indicators where the competing loading is too close to the primary loading.

What does bootstrapped HTMT with bias-corrected confidence intervals add when problems persist?

Bootstrapping HTMT helps determine whether discriminant validity is supported statistically. The transcript notes checking bias-corrected confidence intervals for the presence of a “1.” When the interval does not include 1 (no “1 in between”), discriminant validity can be supported even if HTMT values were initially concerning. If the interval still indicates a problem, the transcript treats that as evidence that discriminant validity issues may not be fixable through item deletion alone.

What are the escalation options if discriminant validity issues remain after the standard fixes?

If discriminant validity issues persist, one option is collapsing measures: combine constructs into a single higher-order construct when correlations between theoretically distinct dimensions are repeatedly reported in the literature around 0.8–0.9. The transcript gives an example of combining structural capital (SC) and human capital (HC). If that still doesn’t resolve the issue, collecting additional data is recommended. The transcript also mentions that sampling flukes can drive the problem; dropping independent variables that are colinear and show insufficient discriminant validity can help reduce multicollinearity-like effects.

Review Questions

When is it appropriate to delete an indicator for convergent validity, and how should composite reliability and AVE influence that decision?
What sequence of checks is recommended after HTMT indicates discriminant validity problems: data standard deviation, cross-loadings, or confidence intervals first—and why?
If bootstrapped HTMT confidence intervals still suggest discriminant validity failure, what two higher-level remedies are proposed?

Key Points

1
Use at least four to six items per construct in reflective models, since SEMinR may delete indicators that fail to load or cross-load.
2
Improve convergent validity by removing indicators only when the deletion increases composite reliability and AVE, not simply because outer loadings fall below 0.70.
3
Preserve content validity when deleting items; an indicator’s removal can weaken construct coverage even if statistics improve.
4
Diagnose discriminant validity with HTMT, then target the specific construct pairs that exceed the threshold.
5
Clean the dataset when needed by removing respondents with construct-level response standard deviation below 0.25, then re-run HTMT to verify improvement.
6
For cross-loadings, remove items when the difference between primary and competing loadings is less than 0.10.
7
If HTMT problems persist after item and data cleaning, consider bootstrapped confidence intervals, collapsing highly correlated dimensions (around 0.8–0.9), or collecting additional data.

Highlights

Convergent validity fixes should be metric-driven: delete indicators only when composite reliability and AVE improve, with content validity kept in mind.

Discriminant validity can improve dramatically after removing respondents with near-zero response standard deviation (below 0.25), which signals poor answering behavior.

Cross-loadings are handled with a concrete rule: remove items when the primary-vs-competing loading gap is under 0.10.

Bootstrapped HTMT confidence intervals provide a decisive check: discriminant validity is supported when the bias-corrected interval does not include 1.

When dimensions remain highly correlated (often 0.8–0.9 in literature), collapsing them into a single measure can be a pragmatic last resort.

Topics

Mentioned

SC
HC
HTMT
AVE