14. SPSS AMOS | Factor Loadings in Structural Equation Modelling (SEM)

TL;DR

Factor loadings quantify how strongly each indicator reflects its latent construct and are estimated via confirmatory factor analysis in the measurement model.

Briefing Cornell Notes

Briefing

Low factor loadings in structural equation modeling (SEM) can threaten convergent and discriminant validity—but dropping indicators isn’t automatic. Factor loadings measure how strongly each observed indicator reflects its underlying latent construct, and they come from confirmatory factor analysis in the measurement model. When an indicator’s loading is weak, it may contribute little to explaining the construct; when it’s weak but the construct still performs well overall, it might still capture a unique aspect of the concept.

A common rule-of-thumb threshold is around 0.70 for factor loadings, yet the practical decision depends on the construct’s overall reliability and the magnitude of the weakness. If composite reliability and related metrics remain acceptable—such as composite reliability above 0.70 and an AVE (average variance extracted) above 0.50—then low individual loadings do not necessarily invalidate convergent validity. In that situation, an indicator with loading below 0.70 can still be worth keeping, especially when the construct is measured with many items and the majority of indicators load strongly. The weaker item may still help represent a distinct component of the latent construct rather than merely adding noise.

The guidance tightens when loadings fall well below the usual benchmark. If an indicator’s loading is clearly under 0.70 and especially under 0.60, it explains too little variance in that indicator—described as barely accounting for about one-third of the variance. Such an item contributes little to measuring the latent construct and can increase unexplained variance in the model. That imbalance—more unexplained than explained variance—can undermine both convergent validity (indicators reflecting the same construct) and discriminant validity (distinct constructs remaining distinct).

Even when deletion seems justified, the process needs safeguards. Dropping indicators based on one dataset invites criticism that the model is being tuned to chance. To avoid capitalizing on random sampling variation, changes to the measurement model should be verified with additional data. The recommended approach is pretesting or pilot testing: decide in the pilot which items to drop, then collect a final dataset to confirm that the revised factor structure and measurement properties hold. If the same items repeatedly underperform in the second data collection, that pattern supports the deletion as a substantive measurement issue rather than a one-off artifact of the first sample.

Finally, the lecture emphasizes a content-and-reliability lens rather than a purely mechanical one. In social sciences, outer loadings below 0.70 are common, so researchers should examine how removing an indicator affects composite reliability, content validity, and AVE. Items should be considered for removal mainly when deletion improves composite reliability and helps AVE exceed recommended thresholds. In short: low loadings call for diagnostic attention, but indicator removal should be driven by whether it improves overall measurement quality and is stable across samples.

Cornell Notes

Factor loadings in SEM quantify how well each observed indicator reflects its latent construct, derived from confirmatory factor analysis. A loading near or below 0.70 does not automatically require deletion if the construct still meets overall criteria such as composite reliability above 0.70 and AVE above 0.50; weaker items may capture unique components. When loadings drop clearly below 0.60, the indicator explains too little variance and can increase unexplained variance, harming convergent and discriminant validity. Indicator deletion should be validated with a second data collection to avoid capitalizing on chance. Pretesting/pilot testing is recommended: drop items during the pilot, then confirm the revised measurement model in the final dataset.

What exactly does a factor loading represent in SEM, and where does it come from?

A factor loading is the strength of the relationship between a latent construct and an observed indicator—effectively the correlation between the construct and that indicator. It is produced as a coefficient during confirmatory factor analysis for the measurement model, indicating how well a particular indicator measures the underlying factor.

If an indicator’s loading is below 0.70, when is it still reasonable to keep it?

Keeping an indicator can be reasonable when overall construct performance remains strong. The lecture highlights that if composite reliability is above 0.70 and AVE is above 0.50, convergent validity can still be considered established even if some individual loadings fall below 0.70. This is especially plausible when the construct has many indicators and most load well, while the weaker items may still represent a unique component.

At what point does a low loading become a stronger reason to delete an indicator?

The lecture draws a sharper line at loadings below 0.60. Such an indicator is described as barely explaining about one-third of the variance in the indicator, meaning it contributes little to understanding the latent construct. This can increase unexplained variance in the model and reduce the ability to achieve convergent and discriminant validity.

Why is a single-sample indicator deletion risky, and what is the recommended fix?

Dropping indicators after seeing results in one dataset can lead to criticism that the model is being tailored to chance (capitalizing on chance). The recommended fix is stability testing: collect a second dataset and check whether the same items still underperform. If the same items are dropped again, the deletion is more defensible as a real measurement problem.

How should pretesting/pilot testing be used when adapting scales to new contexts?

Pretesting/pilot testing should be where item deletion decisions are made. After the pilot, the final data collection should verify that the revised factor structure and measurement properties hold for each construct. This is particularly important when adapting an indicator set into a new context or measuring a relatively new construct.

What decision rule should guide deletion beyond the 0.70 loading threshold?

Deletion should be guided by whether it improves composite reliability and AVE while preserving content validity. The lecture advises not deleting items solely because their loadings are low; instead, remove items only if deletion increases composite reliability and helps AVE exceed recommended thresholds.

Review Questions

How do composite reliability and AVE influence the decision to keep an indicator with a loading below 0.70?
What measurement-quality problem can occur when indicators with loadings below 0.60 are retained?
What steps help ensure indicator deletion decisions are not driven by chance?

Key Points

1
Factor loadings quantify how strongly each indicator reflects its latent construct and are estimated via confirmatory factor analysis in the measurement model.
2
A loading near or below 0.70 does not automatically require deletion if composite reliability exceeds 0.70 and AVE exceeds 0.50.
3
Indicators with loadings clearly below 0.60 explain too little variance and can increase unexplained variance, weakening convergent and discriminant validity.
4
Indicator deletion should be validated with a second data collection to avoid capitalizing on chance from one sample.
5
Use pretesting/pilot testing when adapting scales: decide item deletions in the pilot, then confirm the revised measurement model in the final dataset.
6
Deletion decisions should consider composite reliability, AVE, and content validity—not just whether a loading falls below 0.70.

Highlights

Low factor loadings are not automatically disqualifying when overall construct reliability and AVE remain acceptable.

Loadings below 0.60 are treated as especially problematic because they explain too little variance in the indicator.

Dropping items based on one dataset invites “capitalizing on chance,” so stability must be tested with a second sample.

Pretesting/pilot testing is the practical workflow: drop items during the pilot, then verify the revised structure in final data.

Indicator removal should improve composite reliability and AVE while maintaining content validity, not merely chase the 0.70 cutoff.