The Concept and Process of Predictive Power Assessment using PLSPredict in SmartPLS3
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
R² measures in-sample explanatory power and does not directly assess out-of-sample predictive power.
Briefing
Predictive power in PLS-SEM can’t be judged by R² alone, because R² measures in-sample explanatory strength, not out-of-sample forecasting. To assess whether a model can predict new observations, PLS Predict uses a holdout-based, out-of-sample procedure: the dataset is split into a training portion and a holdout portion, the model is estimated on training data, and then predictions are generated for the holdout data. That distinction matters because prediction quality is ultimately about how close predicted values are to unseen outcomes—something R² cannot guarantee.
PLS Predict operationalizes this through K-fold cross-validation. The full dataset is randomly divided into K equally sized folds (SmartPLS defaults to K=10, with 10 commonly recommended). For each fold, one subset acts as the holdout sample while the remaining K−1 folds form the training sample. The model parameters—such as path coefficients, indicator weights, and loadings—are estimated using the training sample, then used to predict the outcomes in the holdout fold. This process repeats so that every fold serves as a holdout at least once, producing prediction errors for each endogenous construct indicator across folds.
A key practical requirement is sample-size adequacy within each training set. Even though the overall dataset is partitioned into training and holdout subsets, each fold’s training portion must satisfy minimum sample size guidelines; otherwise, predictive comparisons can become unreliable. Because fold assignment is random, extreme partitions can occasionally produce abnormal solutions, so running PLS Predict multiple times (repetitions) is recommended to stabilize results.
Prediction performance is quantified using prediction error metrics computed from the differences between actual and predicted values. The residual error here is not a mistake—it’s the gap that prediction aims to minimize. The most common metric is RMSE (root mean square error), which is preferred when prediction error distributions are reasonably symmetric. If the error distribution is highly non-symmetric with long tails, MAE (mean absolute error) becomes more appropriate. SmartPLS provides histograms to check whether residuals show problematic skew.
To interpret RMSE or MAE, PLS Predict compares them against a naive benchmark built from linear regression. For each endogenous indicator, a linear regression model predicts the dependent indicators using the exogenous indicators from the PLS path model. If PLS achieves lower RMSE/MAE than the linear benchmark for all indicators, predictive power is high; if only a majority show lower errors, predictive power is medium; if only a minority do, predictive power is low; and if none do, predictive power is essentially absent.
In the SmartPLS workflow, the process starts by inspecting the PLS MV error histogram to decide between RMSE and MAE. Then the “manifest variables prediction summary” is used to compare PLS RMSE values with the linear model’s RMSE values. In the example run described, Q² predict is acceptable, residual errors look sufficiently symmetric to justify RMSE, and PLS RMSE is lower than the linear benchmark for nearly all indicators—leading to a conclusion of high predictive power (with one exception).
Cornell Notes
R² reflects in-sample explanatory power, not out-of-sample predictive power. PLS Predict addresses this by estimating the PLS-SEM model on a training sample and predicting outcomes in a holdout sample, repeated across K folds (SmartPLS default K=10). For each endogenous indicator, it computes prediction errors and summarizes them with RMSE or MAE, chosen based on whether residual error distributions are roughly symmetric or show long tails. Predictive power is then judged by comparing PLS prediction errors to a naive linear regression benchmark: lower errors for all indicators imply high predictive power, for most imply medium, for few imply low, and for none imply no predictive power. This workflow also includes checking training-set minimum sample size and running multiple repetitions to avoid unstable fold partitions.
Why doesn’t R² tell researchers whether a PLS-SEM model can predict new observations?
How does PLS Predict generate out-of-sample predictions in practice?
What determines whether RMSE or MAE should be used?
What benchmark is used to interpret prediction errors, and how does it affect the final predictive-power label?
Why are minimum sample size checks and multiple repetitions important in PLS Predict?
Review Questions
- In what way does out-of-sample predictive power differ from in-sample explanatory power, and why does that distinction matter when interpreting R²?
- Describe the K-fold cross-validation steps used by PLS Predict, including what is estimated on training data and what is predicted on holdout data.
- How do RMSE/MAE results get translated into “high,” “medium,” “low,” or “no” predictive power using the linear regression benchmark?
Key Points
- 1
R² measures in-sample explanatory power and does not directly assess out-of-sample predictive power.
- 2
PLS Predict estimates the PLS-SEM model on training data and predicts endogenous indicators in a holdout sample across K folds.
- 3
SmartPLS defaults to K=10 for PLS Predict, and running multiple repetitions helps reduce instability from random fold partitions.
- 4
RMSE is preferred when prediction error residuals are roughly symmetric; MAE is preferred when errors show highly non-symmetric distributions with long tails.
- 5
Predictive power is determined by comparing PLS prediction errors (RMSE/MAE) against a naive linear regression benchmark for each endogenous indicator.
- 6
High predictive power requires PLS to produce lower prediction errors than the benchmark for all indicators; medium/low/no predictive power depend on whether a majority, minority, or none of indicators improve.