#SmartPLS4 Series 34 - Quick Guide: Assess Predictive Validity/Predictive Power using PLS Predict?
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Prediction error in PLS-SEM is the residual gap between observed and predicted endogenous values, and lower values indicate better predictive performance.
Briefing
Assessing a structural model’s predictive power in PLS-SEM comes down to comparing how closely predicted values match actual values for each endogenous construct. The key idea is that “prediction error” isn’t treated as a mistake; it’s the residual—the gap between observed and predicted scores. Lower residual-based error means stronger predictive performance, and researchers quantify that error using prediction statistics such as RMSE (root mean square error) or MAE (mean absolute error).
RMSE is the default choice in most cases because it penalizes larger deviations more heavily. But when the prediction error distribution is highly non-symmetric—showing a long left or right tail—MAE becomes more appropriate, since it better reflects average absolute deviations under skewed error patterns. SmartPLS4’s PLS Predict workflow operationalizes this by generating prediction errors for endogenous constructs and then comparing them against a naive linear model (LM) benchmark.
The decision rules hinge on how many indicators (manifest variables) produce lower prediction errors than the LM benchmark. If all indicators—or at least a clear majority—show low prediction errors relative to LM, the model’s predictive power is rated high or medium. If only a minority of indicators outperform the LM benchmark, predictive power is rated low. The strictest “no predictive power” outcome occurs when none of the indicators achieve lower prediction error than the LM benchmark—meaning every endogenous indicator’s RMSE/MAE is higher than the linear model’s.
In the SmartPLS4 example, the workflow starts with running “Calculate PLS Predict,” using default settings (including a repetition count of 10). The output is then checked for whether the prediction error distribution is symmetric. By inspecting the PLS Predict error histogram for the endogenous variables, the error pattern appears balanced, with no extreme long-tail behavior. That symmetry leads to using RMSE rather than MAE.
Next, the report compares RMSE values for each endogenous indicator against the corresponding LM RMSE values. The results show that many indicators have RMSE values lower than the LM benchmark, but not enough to qualify as strong predictive performance. Specifically, out of 24 indicators, 13 have higher RMSE than the LM benchmark, while the remaining indicators perform better. Under the stated guidelines, that pattern—where a minority of indicators outperform the benchmark and a substantial portion do not—maps to a “low predictive power” conclusion. In short: the model predicts some indicators better than a naive linear baseline, but too many indicators underperform, limiting overall predictive usefulness.
The practical takeaway is straightforward: use PLS Predict in SmartPLS4, choose RMSE when the error distribution is symmetric (MAE when it isn’t), and classify predictive power by counting how many endogenous indicators beat the LM benchmark on the selected error metric.
Cornell Notes
Predictive validity/power in PLS-SEM is assessed by measuring residual-based prediction error for endogenous constructs and comparing it to a naive linear model (LM) benchmark. Prediction error is quantified with RMSE by default; switch to MAE when the prediction error distribution is highly non-symmetric with long left/right tails. SmartPLS4’s PLS Predict workflow generates error histograms to check symmetry and then reports RMSE (or MAE) for each indicator alongside LM RMSE. Predictive power is classified by how many indicators have lower prediction error than the LM benchmark: majority/ all implies high or medium predictive power, minority implies low predictive power, and none implies no predictive power. In the example, RMSE is used after confirming symmetric error, and 13 of 24 indicators have higher RMSE than LM, leading to a low predictive power rating.
Why does prediction error matter in PLS-SEM, and what does “error” mean here?
How do RMSE and MAE differ in practice for choosing a metric?
What does the LM benchmark comparison accomplish?
What rule determines whether predictive power is high, medium, low, or none?
How was RMSE selected in the SmartPLS4 example?
What specific result led to the “low predictive power” conclusion?
Review Questions
- When should MAE be used instead of RMSE in PLS Predict, and what feature of the error distribution triggers the switch?
- How does the count of indicators beating the LM benchmark determine the predictive power category?
- In the example, why does having 13 indicators with higher RMSE than LM lead to a “low predictive power” rating?
Key Points
- 1
Prediction error in PLS-SEM is the residual gap between observed and predicted endogenous values, and lower values indicate better predictive performance.
- 2
RMSE is the default prediction error metric; MAE is preferred when prediction error distributions show long left/right tails (high non-symmetry).
- 3
SmartPLS4’s PLS Predict uses error histograms to check whether the prediction error distribution is symmetric before choosing RMSE vs MAE.
- 4
Predictive power is classified by comparing each endogenous indicator’s prediction error to the naive LM benchmark.
- 5
If most or all indicators have lower prediction error than LM, predictive power is high (or medium); if only a minority do, predictive power is low.
- 6
If none of the indicators beat LM on RMSE/MAE, the model has no predictive power.
- 7
In the example, RMSE was used after confirming symmetric error, and 13 of 24 indicators had higher RMSE than LM, resulting in low predictive power.