CBSEM using #SmartPLS4 | 6 | Factor Loadings and Model Fit Statistics

TL;DR

Standardized factor loadings place indicator weights on a comparable 0–1 scale, making it easier to judge which items best represent a latent construct.

Briefing Cornell Notes

Briefing

Factor loadings and model fit statistics sit at the heart of confirmatory factor analysis (CFA) and structural equation modeling (SEM) because they determine whether observed items meaningfully represent unobservable constructs—and whether the overall measurement and structural setup matches the data closely enough to trust. Factor loadings quantify the effect of an unobservable construct (like job satisfaction) on its indicators. Standardized factor loadings are typically reported because they put indicator weights on a comparable 0–1 scale: a standardized loading of 0.80 implies the construct explains 0.64 (64%) of the indicator’s variance. As a practical rule, standardized loadings above 0.70 (or indicators explaining at least half their variance) are treated as acceptable; weaker indicators may contribute little and can be candidates for deletion, though that decision depends on additional conditions.

Once factor loadings are set, measurement error for each indicator can be derived from the explained variance: measurement error is computed as 1 − r². Lower explained variance means higher measurement error, so factor loadings directly signal how noisy an indicator is relative to the construct it is supposed to measure.

CFA/SEM also requires a metric for each latent variable. In practice, one factor loading per construct is constrained to 1 to set the scale (often called a “reference term”). Without this constraint, SEM estimation can fail with an unidentified error. When comparing multiple groups or samples—such as male versus female—consistency matters: the same indicator should be constrained to 1 in each group so the latent constructs are measured on the same scale across comparisons.

Model fit statistics then assess how closely the model-implied covariance matrix matches the observed covariance matrix. A good fit indicates the specified covariance structure is a close representation of the data; a poor fit suggests the data contradicts the model. Importantly, good overall fit does not guarantee every part of the model is correct. Fit can also be misleading when comparing models with different numbers of indicators per factor: more indicators can make fit harder to achieve, so parsimony is rewarded.

The chi-square (χ²) goodness-of-fit test is a classic starting point, but it is also a “badness of fit” measure: χ² should be non-significant for a good fit. Yet χ² is highly sensitive to sample size, so a relative chi-square (χ² divided by degrees of freedom) is often used. A commonly cited benchmark is a relative χ² between 3 and 5.

Beyond χ², several fit indices are used. Comparative Fit Index (CFI) values above 0.90 indicate good fit and are less affected by sample size. Incremental Fit Index (IFI) and Tucker-Lewis Index (TLI) are also typically considered acceptable when above 0.90. Root Mean Square Error of Approximation (RMSEA) is a badness-of-fit measure where values near zero are best; under 0.05 is good, 0.05–0.08 adequate, and above 0.10 poor. Standardized Root Mean Square Residual (SRMR) is another badness-of-fit metric where values ≤0.05 are good, 0.05–0.09 adequate, and higher values worse.

Finally, there is no single universal “golden rule” for fit thresholds. While Bentler and Bonett (1980) popularized the 0.90 cutoff, later work (including Hu and Bentler, 1999) argued for stricter 0.95 standards, and Marsh and colleagues (2004) pushed for using multiple indices while accounting for sample size and estimation conditions. Even a model that meets fit thresholds can still be misspecified, so fit should be interpreted alongside theory, specification checks, and measurement quality.

Cornell Notes

Factor loadings determine how strongly each observed indicator reflects its latent (unobservable) construct. Standardized loadings are commonly used because they scale indicator weights to a 0–1 range; squaring a standardized loading gives the proportion of indicator variance explained (e.g., 0.80 → 64%). Indicators with standardized loadings above about 0.70 (or explaining at least half the variance) are generally considered acceptable, and measurement error can be computed as 1 − r². SEM also requires setting the latent variable metric by constraining one factor loading to 1 (a reference term) to avoid unidentified models; the same indicator should be constrained across groups for comparisons. Model fit then evaluates how well the model-implied covariance matrix matches the observed covariance matrix using indices like χ²/df, CFI, IFI, TLI, RMSEA, and SRMR, though no single threshold is universally accepted.

Why do standardized factor loadings matter more than unstandardized ones in CFA results?

Standardized factor loadings convert indicator weights onto a common 0–1 scale, making it easier to compare which indicators represent the latent construct more strongly. They also enable interpretation through explained variance: squaring a standardized loading yields the proportion of variance in the indicator accounted for by the construct (e.g., a standardized loading of 0.80 explains 0.64 or 64% of the indicator’s variance).

How do factor loadings connect to measurement error for each indicator?

Measurement error is computed as 1 − r², where r² corresponds to the explained variance from the standardized loading. If an indicator’s explained variance is low, r² is small and measurement error becomes large—meaning the indicator is noisier relative to the latent construct. Conversely, higher explained variance implies lower measurement error.

What is the purpose of constraining one factor loading to 1 in SEM, and what happens if it isn’t done?

Each latent variable needs a metric (scale). Constraining one factor loading to 1 sets a reference point (the “reference term”) so the remaining loadings can be estimated freely. If no indicator is constrained to define the metric, the SEM model cannot be identified and estimation fails with an unidentified error.

Why must the same indicator be constrained to 1 when comparing multiple groups (e.g., male vs. female)?

Comparisons across groups require consistent scaling of the latent constructs. If different indicators are used as the reference term in each group, the latent variables may be measured on different scales, undermining comparability. Keeping the same indicator constrained to 1 in every group preserves measurement alignment.

How should chi-square (χ²) be interpreted, and why is relative chi-square often preferred?

Chi-square is a badness-of-fit measure: a good model fit corresponds to χ² being non-significant (no statistically detectable difference between model-implied and observed covariance structures). However, χ² is extremely sensitive to sample size—large samples can flag tiny discrepancies as significant. Relative chi-square (χ² divided by degrees of freedom) reduces this dependence; a commonly cited guideline is a relative χ² between 3 and 5 for good fit.

What do common fit indices (CFI, IFI, TLI, RMSEA, SRMR) indicate, and what are typical benchmarks?

CFI, IFI, and TLI are comparative/incremental indices where higher values indicate better fit; values above about 0.90 are often treated as acceptable (with some literature advocating 0.95). RMSEA and SRMR are badness-of-fit measures where lower is better: RMSEA near zero is best (good if <0.05; adequate around 0.05–0.08; poor if >0.10), and SRMR is good at ≤0.05, adequate around 0.05–0.09, and worse above that.

Review Questions

If a standardized factor loading is 0.70, what proportion of the indicator’s variance is explained by the latent construct?
What does it mean to constrain one factor loading to 1 in SEM, and why does it prevent unidentified model errors?
Name two fit indices that are less sensitive to sample size and two that are interpreted as badness-of-fit measures.

Key Points

1
Standardized factor loadings place indicator weights on a comparable 0–1 scale, making it easier to judge which items best represent a latent construct.
2
Squaring a standardized factor loading gives the proportion of an indicator’s variance explained by the latent construct (e.g., 0.80 → 64%).
3
Measurement error for an indicator can be computed as 1 − r²; lower explained variance implies higher measurement error.
4
SEM identification requires setting each latent variable’s metric by constraining one factor loading to 1 (the reference term).
5
When comparing groups, the same indicator should be constrained to 1 in every group to keep latent construct scales consistent.
6
Model fit evaluates how closely the model-implied covariance matrix matches the observed covariance matrix; good overall fit does not guarantee correct specification of every component.
7
Fit thresholds vary across literature, so using multiple fit indices and considering sample size and estimation conditions is more reliable than relying on a single cutoff.

Highlights

A standardized factor loading of 0.80 implies the latent construct explains 64% of that indicator’s variance.

Constraining one factor loading to 1 sets the latent variable’s metric; without it, SEM estimation can fail due to unidentified models.

Relative chi-square (χ²/df) helps reduce sample-size sensitivity compared with raw chi-square.

CFI/IFI/TLI are typically judged by higher-is-better benchmarks (often >0.90), while RMSEA and SRMR are judged by lower-is-better cutoffs (e.g., RMSEA <0.05 for good fit).

There is no universal “golden rule” for fit cutoffs; later research recommends multiple indices and context-aware interpretation.

Topics

Factor Loadings
Measurement Error
SEM Identification
Model Fit Indices
Chi-Square Goodness of Fit

Mentioned

Bentler
Bonett
Hu
Marsh
Tucker
CFA
SEM
CFI
IFI
TLI
RMSEA
SRMR
χ²