#SmartPLS4 Series 33 - How to use PLS Predict to assess Predictive Validity/Predictive Power?
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
R² measures in-sample explanatory power and cannot establish out-of-sample predictive power.
Briefing
Relying on R² to judge predictive quality can mislead researchers because R² measures in-sample explanatory power, not out-of-sample predictive power. To assess whether a PLS-SEM model can forecast new observations, the session focuses on PLS Predict, a procedure that estimates the model on a training subset and evaluates prediction performance on a separate holdout subset.
PLS Predict works by splitting the full dataset into two parts before estimation: a training sample (used to estimate model parameters like path coefficients, indicator weights, and loadings) and a holdout sample (used only for prediction). The method then repeats this process using k-fold cross-validation. With k folds (SmartPLS defaults to 10; 10 is recommended), the dataset is randomly divided into equally sized subsets. For each fold, one subset acts as the holdout sample while the remaining k−1 subsets are combined into the training sample. Predictions are generated for the holdout subset using the model estimated from the training subset, and this is repeated until every fold has served as holdout at least once.
Because random partitioning can occasionally produce extreme splits that lead to abnormal solutions, the procedure can be run multiple times (SmartPLS offers a repetitions setting; the session keeps it at the default of 10). Predictive performance is then quantified through prediction error metrics for each endogenous construct’s indicators. The key idea is that prediction error is not a “mistake” but a residual: lower residuals mean predicted values track actual values more closely.
The most common metric is RMSE (root mean square error). MAE (mean absolute error) becomes preferable when the prediction error distribution is highly non-symmetric—specifically when there are long left or right tails. SmartPLS provides histograms to check this distribution shape. Once the appropriate metric is chosen, predictive validity is assessed by comparing PLS-SEM prediction errors against a naive benchmark: a linear regression model (LM) run for each endogenous construct’s indicators on the exogenous construct indicators in the PLS path model. The benchmark RMSE/MAE values come from that linear regression baseline.
Interpretation follows a simple rule set. If most indicators show lower prediction error under PLS-SEM than under the LM benchmark, the model has medium predictive power. If only a minority improve, predictive power is low. If none of the indicators improve (PLS-SEM errors are higher for all indicators), predictive power is effectively absent.
In the worked SmartPLS example, the prediction error histograms look sufficiently symmetric, so RMSE is used. The results show that 13 of 24 indicators have higher RMSE under PLS-SEM than under the LM benchmark, while the remaining indicators perform better. That pattern aligns with the session’s guideline for low predictive power. A separate check of Q² values remains supportive, with all Q² values greater than zero and described as moderate to substantial, reinforcing that the model’s predictive assessment should not rely on R² alone.
Cornell Notes
R² reflects in-sample explanatory power, so it cannot by itself confirm whether a PLS-SEM model predicts new observations. PLS Predict estimates the model on a training sample and evaluates prediction error on a holdout sample using k-fold cross-validation (SmartPLS defaults to 10 folds). Predictive performance is quantified with RMSE or MAE for each endogenous construct’s indicators, then compared against a naive linear regression benchmark (LM). If most indicators have lower prediction error under PLS-SEM than under LM, predictive power is medium; if only a minority improve, it is low; if none improve, predictive power is absent. The example uses RMSE after checking that prediction error histograms are not strongly non-symmetric.
Why is R² not a reliable measure of predictive power in PLS-SEM?
How does PLS Predict implement out-of-sample prediction?
What do RMSE and MAE measure, and when should each be used?
What benchmark does PLS Predict use to judge whether predictions are genuinely better?
How are predictive power levels determined from the indicator-level comparisons?
What conclusion is reached in the example run in SmartPLS?
Review Questions
- How does k-fold cross-validation in PLS Predict change the training and holdout samples across folds?
- What histogram feature determines whether RMSE or MAE should be used?
- What indicator-level comparison against the LM benchmark distinguishes medium predictive power from low predictive power?
Key Points
- 1
R² measures in-sample explanatory power and cannot establish out-of-sample predictive power.
- 2
PLS Predict estimates the PLS-SEM model on a training subset and evaluates predictions on a separate holdout subset.
- 3
SmartPLS’s PLS Predict uses k-fold cross-validation (default 10 folds) and can repeat the procedure to avoid extreme random partitions.
- 4
Prediction error is assessed per endogenous construct indicator using RMSE or MAE; lower error indicates better prediction.
- 5
RMSE is preferred when prediction error distributions are symmetric; MAE is preferred when errors show long left/right tails.
- 6
Predictive power is judged by comparing PLS-SEM prediction errors to a naive linear regression (LM) benchmark.
- 7
In the example, RMSE comparisons show only partial improvement across indicators, leading to a low predictive power classification.