22. SEMinR Series. Evaluating Structural Model

TL;DR

Step 3 evaluates explanatory power using R square (R²) for each endogenous construct, reflecting the variance explained and in-sample predictive power.

Briefing Cornell Notes

Briefing

Step 3 of structural model evaluation zeroes in on explanatory power—how much of the variance in endogenous constructs the model accounts for. That explanatory power is primarily measured with R square (R²), which ranges from 0 to 1. Higher R² values indicate stronger in-sample predictive power because they represent the proportion of variance explained in each endogenous construct. In many social science contexts, R² around 0.25 is often treated as substantial, 0.25–0.50 as moderate, and below that as weak, though the transcript stresses that “acceptable” thresholds depend heavily on research context. It also notes a key statistical caveat: R² tends to rise as more predictor constructs are added, so comparisons should be made against similar studies and models with comparable complexity.

Because R² can inflate with additional variables, adjusted R square is introduced as a more conservative alternative. Adjusted R² corrects for the number of explanatory variables relative to the data size, reducing the tendency to overstate explanatory power. Still, adjusted R² is not treated as a precise measure of how much variance each endogenous construct truly explains, which leads to a third metric: f square (f²) effect size.

f² is used to quantify the contribution of each predictor construct. Conceptually, f² answers a counterfactual question: what happens to R² if a specific exogenous variable is removed from the model? The transcript links f² to the “size” of the predictor’s path contribution in the structural model assessment. Interpretation follows common benchmarks: f² values below 0.15 indicate a small effect, values from 0.15 to 0.35 indicate a medium effect, and values above 0.35 indicate a large effect.

A worked example is provided using an endogenous construct labeled “collaborative culture.” Running the model yields an R² of 0.608 for collaborative culture, meaning about 60.8% of the variance in collaborative culture is explained by three predictor variables included in the model. With only those three predictors present, the R² is described as moderate.

The transcript then turns to f² outputs for the exogenous variables. The effect size for “vision development rewards” on collaborative culture is characterized as small (below 0.15), implying that removing that predictor would cause only a minor drop in R². In contrast, removing “development and rewards” is described as having a medium (moderate) impact on R², consistent with f² falling in the 0.15–0.35 range. The practical takeaway is straightforward: R² tells how much variance the model explains overall for each endogenous construct, adjusted R² helps temper inflation from added predictors, and f² pinpoints which specific predictors meaningfully drive that explained variance.

The session closes by previewing reporting guidance in later videos, emphasizing that these metrics—R² for explanatory power and f² for effect size—must be interpreted and presented in line with study context and comparable model complexity.

Cornell Notes

Explanatory power in SEM structural model evaluation is assessed mainly through R square (R²) for each endogenous construct. R² (0–1) indicates the proportion of variance explained and is often treated as in-sample predictive power, but it increases when more predictors are added, so interpretation must be contextual. Adjusted R² corrects for the number of predictors relative to sample/data size, offering a more conservative view, though it still isn’t a precise variance-explained measure for each endogenous construct. To gauge the impact of individual predictors, f square (f²) effect size is used: it measures how much R² would change if an exogenous variable were removed. Benchmarks commonly used are f² < 0.15 (small), 0.15–0.35 (medium), and > 0.35 (large).

What does R square (R²) measure in Step 3 of structural model evaluation, and why does it matter?

R² measures the variance explained in each endogenous construct—i.e., how much of the endogenous construct’s variability is accounted for by the model’s predictor constructs. It functions as a measure of explanatory power or in-sample predictive power. In the example, collaborative culture has R² = 0.608, meaning about 60.8% of the variance in collaborative culture is explained by the three predictors included in the model.

Why can R² be misleading when comparing models, and how does adjusted R² address that?

R² tends to increase as more predictor constructs are added to the model, which can make explanatory power look better even if the added variables don’t meaningfully improve prediction. Adjusted R² corrects for the number of explanatory variables relative to data size, making it more conservative. The transcript notes adjusted R² is not a precise indicator of how much variance each endogenous construct is explained, but it helps reduce inflation from model size.

How is f square (f²) interpreted, and what does it quantify?

f² quantifies the effect size of a predictor construct by asking: if an exogenous variable were removed, how much would R² change? That change indicates whether the predictor’s contribution is small, medium, or large. The common thresholds given are: f² < 0.15 (small), 0.15–0.35 (medium), and > 0.35 (large).

In the example, what do the R² and f² results imply about collaborative culture?

For collaborative culture, R² = 0.608 indicates the model explains about 60.8% of its variance, described as moderate explanatory power. For effect sizes, the predictor “vision development rewards” is associated with a small f² (below 0.15), meaning removing it would cause only a small reduction in R². Removing “development and rewards” is described as producing a medium/moderate impact on R², consistent with an f² in the 0.15–0.35 range.

How should researchers decide whether an R² value is “acceptable”?

The transcript emphasizes that acceptable R² depends on research context and discipline. While rough guidelines are mentioned (e.g., 0.25 as substantial, lower as moderate/weak), it also notes that some fields may consider R² as low as 0.10 (10% variance explained) satisfactory. It also recommends comparing against R² values from related studies with similar model complexity.

Review Questions

What is the difference between R² and f² in terms of what each metric tells you about model performance?
Why does adjusted R² often be considered more conservative than R², and what problem does it correct for?
If a predictor has f² = 0.20, how would you classify its effect size using the thresholds provided?

Key Points

1
Step 3 evaluates explanatory power using R square (R²) for each endogenous construct, reflecting the variance explained and in-sample predictive power.
2
R² values must be interpreted relative to study context and comparable models because R² tends to increase as more predictors are added.
3
Adjusted R square offers a more conservative estimate by correcting for the number of explanatory variables relative to data size.
4
f square (f²) measures each predictor’s contribution by estimating how much R² would drop if that exogenous variable were removed.
5
Common f² benchmarks are f² < 0.15 (small), 0.15–0.35 (medium), and > 0.35 (large).
6
In the example, collaborative culture has R² = 0.608 (about 60.8% variance explained) from three predictors, described as moderate explanatory power.
7
Effect sizes in the example differ by predictor: one is small (vision development rewards) while another is medium (development and rewards).

Highlights

R² = 0.608 for collaborative culture means the model explains about 60.8% of its variance using the included predictors.

Adjusted R² tempers the upward bias of R² when additional predictor constructs are added.

f² translates predictor importance into a counterfactual: how much R² changes if a predictor is removed.

The example classifies one predictor’s impact as small (f² < 0.15) and another’s as medium (0.15–0.35).

Topics

Structural Model Evaluation
Explanatory Power
R Square
Adjusted R Square
f Square Effect Size

22. SEMinR Series. Evaluating Structural Model | Step 3: Explanatory Power