6. SEM | SPSS AMOS - Factor Loadings, Model Fit, and Modification Indices

Q: When model fit is poor, what do modification indices recommend, and what restrictions apply?

Modification indices suggest model alterations that may improve fit, typically by adding additional covariances between error terms. In Amos output, the modification index (MI) column is used, and changes below a certain threshold may not be shown; values around 3.8–4 are highlighted as potentially meaningful. Changes must be justified and must follow restrictions: covariances between error terms from different constructs are not allowed, and error terms cannot be covaried with latent constructs. Covariances are allowed between error terms within the same construct.

TL;DR

Standardized factor loadings (0–1 scale) are preferred because they allow direct comparison of indicator strength across CFA models.

Briefing Cornell Notes

Briefing

Structural equation modeling hinges on two linked tasks: judging whether the measurement model represents latent constructs well, and checking whether the overall model reproduces the observed covariance structure. Factor loadings in confirmatory factor analysis (CFA) quantify how strongly each unobservable construct (latent variable) drives its observed indicators. Standardized factor loadings are typically reported because they put indicator weights on a comparable 0–1 scale; squaring a standardized loading yields the proportion of explained variance in an indicator. As a practical rule, standardized loadings above 0.70 suggest an indicator is doing meaningful work (explaining at least half the variance), while loadings below that threshold imply the indicator contributes little and may be considered for deletion—though that decision should follow additional conditions.

Once factor loadings are set, measurement error for each indicator can be derived as 1 − r², meaning lower explained variance corresponds to higher error. CFA also requires “metric setting” in the structural equation model: each latent variable must be assigned a scale by constraining one of its factor loadings to 1 (the “reference term”). That constraint acts as an anchor so the remaining loadings can be freely estimated; without it, covariance-based SEM software will fail with an unidentified error. When comparing multiple groups (for example, male versus female), the same indicator should be constrained to 1 in each group to maintain measurement comparability.

Model fit then addresses a different question: does the specified model reproduce the observed covariance matrix closely enough to be considered plausible? A good fit means the estimated covariance structure closely matches the data; a bad fit signals systematic mismatch. Importantly, good overall fit does not guarantee every part of the model is correct. Fit is also influenced by model complexity: models with fewer indicators per factor often show higher apparent fit than models with more indicators, so parsimony matters.

Several fit statistics are used, each with different sensitivities. The chi-square goodness-of-fit test (often called “badness of fit”) should be non-significant for a good fit, but it is highly sensitive to sample size. To reduce that dependence, relative chi-square (chi-square divided by degrees of freedom) is used; one cited guideline places it between 3 and 5 for a good fit. Comparative fit indices such as CFI (above 0.90), IFI (above 0.90), and TLI (above 0.90) are recommended because they are less affected by sample size. RMSEA is treated as a badness-of-fit measure where values near zero are best; thresholds commonly cited are below 0.05 for adequate fit and below 0.08 for acceptable fit. SRMR similarly flags poor fit when it rises above about 0.09.

Because “good fit” thresholds are debated, the guidance emphasizes using multiple indices rather than relying on a single cutoff. Even a well-fitting model can still be misspecified in terms of how relationships are represented. When fit is poor, modification indices offer a route to improvement by suggesting additional covariances—typically between error terms within the same construct. These suggestions must be applied carefully and justified. In Amos output, modifications are often filtered by a threshold (with values around 3.8–4 highlighted as meaningful), and certain changes are explicitly disallowed, such as adding covariance between error terms from different constructs or between an error term and a latent construct.

Cornell Notes

Factor loadings in CFA quantify how latent constructs affect observed indicators, and standardized loadings (0–1 scale) make indicator contributions comparable. Squaring a standardized loading gives the proportion of variance in an indicator explained by the latent construct; values above 0.70 are treated as strong (at least 50% explained variance). Measurement error follows 1 − r², and each latent variable needs a metric set by constraining one factor loading to 1; this is required for identification and must be consistent across groups. Model fit evaluates whether the estimated covariance structure matches the observed covariance matrix using indices like chi-square/relative chi-square, CFI/IFI/TLI, RMSEA, and SRMR. Poor fit can sometimes be addressed with modification indices, but only by adding justified covariances between error terms within the same construct.

How do standardized factor loadings translate into explained variance, and what cutoff is used to judge indicator quality?

Standardized factor loadings are on a 0–1 scale, making them easier to compare across indicators. Squaring a standardized factor loading gives the proportion of explained variance in that indicator. A standardized loading of 0.80 implies 0.64, meaning the latent (unobserved) variable explains 64% of the indicator’s variance. A common rule of thumb is to retain indicators with standardized loadings greater than 0.70 (explaining at least half the variance) and consider deleting indicators that fall below that level, subject to additional conditions.

Why must one factor loading per latent variable be constrained to 1 in SEM/CFA, and what happens if that step is skipped?

Each latent variable needs a metric (scale). Constraining one factor loading to 1 sets a reference point so the remaining loadings can be estimated freely. This constraint is often called the “reference term.” If no such metric constraint is applied, covariance-based SEM software (e.g., Amos) will not run and returns an unidentified error message. For multi-group comparisons, the same indicator should be constrained to 1 in every group (e.g., male and female) to keep the measurement scale consistent.

What does model fit test in SEM actually assess, and why doesn’t good fit guarantee the model is fully correct?

Model fit assesses how well the overall estimated covariance structure reproduces the observed covariance matrix. A good fit indicates the specified model’s implied covariances closely match the data; a bad fit indicates systematic disagreement. However, good overall fit does not mean every part of the model is correct—some localized misspecifications can exist even when global fit indices look acceptable.

Which fit indices are emphasized, and how do their thresholds relate to sample size sensitivity?

Chi-square goodness-of-fit is treated as a badness-of-fit measure and should be non-significant for good fit, but it is sensitive to sample size. Relative chi-square (chi-square divided by degrees of freedom) is used to reduce that sensitivity, with a cited guideline of 3 to 5. CFI, IFI, and TLI are emphasized because they are less affected by sample size; each is commonly judged acceptable when above 0.90. RMSEA and SRMR are also used as badness-of-fit measures: RMSEA near zero is best (commonly <0.05 good, <0.08 adequate), and SRMR values below about 0.09 indicate good fit.

When model fit is poor, what do modification indices recommend, and what restrictions apply?

Modification indices suggest model alterations that may improve fit, typically by adding additional covariances between error terms. In Amos output, the modification index (MI) column is used, and changes below a certain threshold may not be shown; values around 3.8–4 are highlighted as potentially meaningful. Changes must be justified and must follow restrictions: covariances between error terms from different constructs are not allowed, and error terms cannot be covaried with latent constructs. Covariances are allowed between error terms within the same construct.

Review Questions

If a standardized factor loading is 0.75, how much variance in the indicator does the latent construct explain, and would it meet the stated retention rule?
Why is relative chi-square preferred over chi-square in large samples, and what range is cited as indicating good fit?
What kinds of covariance changes are permitted when using modification indices, and which specific changes are explicitly disallowed?

Key Points

1
Standardized factor loadings (0–1 scale) are preferred because they allow direct comparison of indicator strength across CFA models.
2
Squaring a standardized factor loading gives the explained variance in an indicator; loadings above 0.70 are treated as strong (≥50% explained variance).
3
Indicator measurement error can be computed as 1 − r², so lower explained variance implies higher measurement error.
4
Each latent variable must have its metric set by constraining one factor loading to 1; skipping this leads to identification errors in covariance-based SEM.
5
When comparing groups, the same indicator must be constrained to 1 in each group to preserve measurement scale consistency.
6
Model fit should be evaluated with multiple indices (CFI/IFI/TLI, RMSEA, SRMR, and chi-square/relative chi-square) because no single cutoff universally settles fit quality.
7
Modification indices can guide improvements, but only justified covariances between error terms within the same construct are allowed, with a practical MI threshold around 3.8–4.

Highlights

Squaring a standardized factor loading turns it into an explained-variance metric: a loading of 0.80 corresponds to 64% explained variance.

Metric setting in CFA requires fixing one factor loading to 1; without it, covariance-based SEM fails with an unidentified error.

CFI, IFI, and TLI are favored because they’re less sensitive to sample size than chi-square.

RMSEA and SRMR are treated as badness-of-fit measures where smaller values indicate better fit (RMSEA < 0.05; SRMR ≤ 0.09 as common guidance).

Modification indices should be used cautiously: add covariances between error terms within the same construct, not across constructs or with latent variables.

Topics

Factor Loadings
Model Fit Indices
Metric Setting
Modification Indices
CFA Measurement Error

Mentioned

CFA
SEM
AMOS
CFI
IFI
TLI
RMSEA
SRMR
MI

6. SEM | SPSS AMOS - Factor Loadings, Model Fit, and Modification Indices - Research Coach