The Concept and Process of Predictive Power Assessment using PLSPredict in SmartPLS3

TL;DR

R² measures in-sample explanatory power and does not directly assess out-of-sample predictive power.

Briefing Cornell Notes

Briefing

Predictive power in PLS-SEM can’t be judged by R² alone, because R² measures in-sample explanatory strength, not out-of-sample forecasting. To assess whether a model can predict new observations, PLS Predict uses a holdout-based, out-of-sample procedure: the dataset is split into a training portion and a holdout portion, the model is estimated on training data, and then predictions are generated for the holdout data. That distinction matters because prediction quality is ultimately about how close predicted values are to unseen outcomes—something R² cannot guarantee.

PLS Predict operationalizes this through K-fold cross-validation. The full dataset is randomly divided into K equally sized folds (SmartPLS defaults to K=10, with 10 commonly recommended). For each fold, one subset acts as the holdout sample while the remaining K−1 folds form the training sample. The model parameters—such as path coefficients, indicator weights, and loadings—are estimated using the training sample, then used to predict the outcomes in the holdout fold. This process repeats so that every fold serves as a holdout at least once, producing prediction errors for each endogenous construct indicator across folds.

A key practical requirement is sample-size adequacy within each training set. Even though the overall dataset is partitioned into training and holdout subsets, each fold’s training portion must satisfy minimum sample size guidelines; otherwise, predictive comparisons can become unreliable. Because fold assignment is random, extreme partitions can occasionally produce abnormal solutions, so running PLS Predict multiple times (repetitions) is recommended to stabilize results.

Prediction performance is quantified using prediction error metrics computed from the differences between actual and predicted values. The residual error here is not a mistake—it’s the gap that prediction aims to minimize. The most common metric is RMSE (root mean square error), which is preferred when prediction error distributions are reasonably symmetric. If the error distribution is highly non-symmetric with long tails, MAE (mean absolute error) becomes more appropriate. SmartPLS provides histograms to check whether residuals show problematic skew.

To interpret RMSE or MAE, PLS Predict compares them against a naive benchmark built from linear regression. For each endogenous indicator, a linear regression model predicts the dependent indicators using the exogenous indicators from the PLS path model. If PLS achieves lower RMSE/MAE than the linear benchmark for all indicators, predictive power is high; if only a majority show lower errors, predictive power is medium; if only a minority do, predictive power is low; and if none do, predictive power is essentially absent.

In the SmartPLS workflow, the process starts by inspecting the PLS MV error histogram to decide between RMSE and MAE. Then the “manifest variables prediction summary” is used to compare PLS RMSE values with the linear model’s RMSE values. In the example run described, Q² predict is acceptable, residual errors look sufficiently symmetric to justify RMSE, and PLS RMSE is lower than the linear benchmark for nearly all indicators—leading to a conclusion of high predictive power (with one exception).

Cornell Notes

R² reflects in-sample explanatory power, not out-of-sample predictive power. PLS Predict addresses this by estimating the PLS-SEM model on a training sample and predicting outcomes in a holdout sample, repeated across K folds (SmartPLS default K=10). For each endogenous indicator, it computes prediction errors and summarizes them with RMSE or MAE, chosen based on whether residual error distributions are roughly symmetric or show long tails. Predictive power is then judged by comparing PLS prediction errors to a naive linear regression benchmark: lower errors for all indicators imply high predictive power, for most imply medium, for few imply low, and for none imply no predictive power. This workflow also includes checking training-set minimum sample size and running multiple repetitions to avoid unstable fold partitions.

Why doesn’t R² tell researchers whether a PLS-SEM model can predict new observations?

R² measures how well the model explains variance within the sample used to estimate it (in-sample explanatory power). Prediction quality for new or future observations requires out-of-sample forecasting, which depends on how the model performs on data not used during estimation (holdout samples). Since R² cannot evaluate performance on unseen data, it doesn’t directly measure out-of-sample predictive power.

How does PLS Predict generate out-of-sample predictions in practice?

PLS Predict splits the dataset into K folds. For each fold, one subset becomes the holdout sample and the remaining K−1 folds form the training sample. The model is estimated on the training sample (estimating path coefficients, indicator weights, and loadings), then those estimates are used to predict the endogenous indicators in the holdout fold. This repeats across folds so every subset is predicted as holdout at least once.

What determines whether RMSE or MAE should be used?

RMSE is the default choice when prediction error distributions are reasonably symmetric. If the residual error distribution is highly non-symmetric—showing a long left or right tail—MAE is more appropriate. SmartPLS provides error histograms (e.g., PLS MV error histogram) to assess whether tails are too extreme.

What benchmark is used to interpret prediction errors, and how does it affect the final predictive-power label?

Prediction errors from PLS (RMSE or MAE) are compared to a naive linear regression benchmark. The benchmark is obtained by running linear regression for each dependent construct indicator using the exogenous indicators from the PLS path model. Predictive power depends on how many indicators have lower PLS errors than the benchmark: all indicators → high predictive power; majority → medium; minority → low; none → no predictive power.

Why are minimum sample size checks and multiple repetitions important in PLS Predict?

Even though the full dataset is split into training and holdout sets, each fold’s training sample must meet minimum sample size guidelines; otherwise, parameter estimation may be unstable and predictive comparisons misleading. Because fold assignment is random, extreme partitions can create abnormal solutions, so running PLS Predict multiple times (repetitions) helps stabilize results.

Review Questions

In what way does out-of-sample predictive power differ from in-sample explanatory power, and why does that distinction matter when interpreting R²?
Describe the K-fold cross-validation steps used by PLS Predict, including what is estimated on training data and what is predicted on holdout data.
How do RMSE/MAE results get translated into “high,” “medium,” “low,” or “no” predictive power using the linear regression benchmark?

Key Points

1
R² measures in-sample explanatory power and does not directly assess out-of-sample predictive power.
2
PLS Predict estimates the PLS-SEM model on training data and predicts endogenous indicators in a holdout sample across K folds.
3
SmartPLS defaults to K=10 for PLS Predict, and running multiple repetitions helps reduce instability from random fold partitions.
4
RMSE is preferred when prediction error residuals are roughly symmetric; MAE is preferred when errors show highly non-symmetric distributions with long tails.
5
Predictive power is determined by comparing PLS prediction errors (RMSE/MAE) against a naive linear regression benchmark for each endogenous indicator.
6
High predictive power requires PLS to produce lower prediction errors than the benchmark for all indicators; medium/low/no predictive power depend on whether a majority, minority, or none of indicators improve.

Highlights

R² can’t validate forecasting ability because it’s tied to the same data used for estimation; holdout-based prediction is required.

PLS Predict’s K-fold routine repeatedly estimates the model on K−1 folds and predicts the remaining fold, producing out-of-sample errors for each indicator.

RMSE vs MAE selection hinges on residual error symmetry—histograms in SmartPLS guide the choice.

Predictive power is graded by how often PLS beats a linear regression benchmark across endogenous indicators, not by a single global statistic.

Topics

PLS Predict
Predictive Power Assessment
K-Fold Cross Validation
RMSE and MAE
SmartPLS Workflow

Mentioned

SmartPLS3
SmartPLS
PLS
PLS-SEM
PLSPredict
PLS predict
PLS SCM
Q Square predict
RMSE
MAE
LM
MV
K