#SmartPLS4 Series 33 - How to use PLS Predict to assess Predictive Validity/Predictive Power?

TL;DR

R² measures in-sample explanatory power and cannot establish out-of-sample predictive power.

Briefing Cornell Notes

Briefing

Relying on R² to judge predictive quality can mislead researchers because R² measures in-sample explanatory power, not out-of-sample predictive power. To assess whether a PLS-SEM model can forecast new observations, the session focuses on PLS Predict, a procedure that estimates the model on a training subset and evaluates prediction performance on a separate holdout subset.

PLS Predict works by splitting the full dataset into two parts before estimation: a training sample (used to estimate model parameters like path coefficients, indicator weights, and loadings) and a holdout sample (used only for prediction). The method then repeats this process using k-fold cross-validation. With k folds (SmartPLS defaults to 10; 10 is recommended), the dataset is randomly divided into equally sized subsets. For each fold, one subset acts as the holdout sample while the remaining k−1 subsets are combined into the training sample. Predictions are generated for the holdout subset using the model estimated from the training subset, and this is repeated until every fold has served as holdout at least once.

Because random partitioning can occasionally produce extreme splits that lead to abnormal solutions, the procedure can be run multiple times (SmartPLS offers a repetitions setting; the session keeps it at the default of 10). Predictive performance is then quantified through prediction error metrics for each endogenous construct’s indicators. The key idea is that prediction error is not a “mistake” but a residual: lower residuals mean predicted values track actual values more closely.

The most common metric is RMSE (root mean square error). MAE (mean absolute error) becomes preferable when the prediction error distribution is highly non-symmetric—specifically when there are long left or right tails. SmartPLS provides histograms to check this distribution shape. Once the appropriate metric is chosen, predictive validity is assessed by comparing PLS-SEM prediction errors against a naive benchmark: a linear regression model (LM) run for each endogenous construct’s indicators on the exogenous construct indicators in the PLS path model. The benchmark RMSE/MAE values come from that linear regression baseline.

Interpretation follows a simple rule set. If most indicators show lower prediction error under PLS-SEM than under the LM benchmark, the model has medium predictive power. If only a minority improve, predictive power is low. If none of the indicators improve (PLS-SEM errors are higher for all indicators), predictive power is effectively absent.

In the worked SmartPLS example, the prediction error histograms look sufficiently symmetric, so RMSE is used. The results show that 13 of 24 indicators have higher RMSE under PLS-SEM than under the LM benchmark, while the remaining indicators perform better. That pattern aligns with the session’s guideline for low predictive power. A separate check of Q² values remains supportive, with all Q² values greater than zero and described as moderate to substantial, reinforcing that the model’s predictive assessment should not rely on R² alone.

Cornell Notes

R² reflects in-sample explanatory power, so it cannot by itself confirm whether a PLS-SEM model predicts new observations. PLS Predict estimates the model on a training sample and evaluates prediction error on a holdout sample using k-fold cross-validation (SmartPLS defaults to 10 folds). Predictive performance is quantified with RMSE or MAE for each endogenous construct’s indicators, then compared against a naive linear regression benchmark (LM). If most indicators have lower prediction error under PLS-SEM than under LM, predictive power is medium; if only a minority improve, it is low; if none improve, predictive power is absent. The example uses RMSE after checking that prediction error histograms are not strongly non-symmetric.

Why is R² not a reliable measure of predictive power in PLS-SEM?

R² measures how well the model explains variance in the data used for estimation (in-sample). It does not evaluate out-of-sample prediction because the model cannot predict observations that were not used during estimation. Predictive power in this context means forecasting new or future observations, which requires a holdout-based evaluation rather than relying on R² alone.

How does PLS Predict implement out-of-sample prediction?

PLS Predict splits the dataset into a training sample and a holdout sample before estimation. The model parameters (path coefficients, indicator weights, loadings) are estimated using the training sample, then those estimates are used to predict the holdout sample. With k-fold cross-validation, the holdout subset changes across folds: for each fold, one subset is held out while the other k−1 subsets form the training set. Predictions are generated for every holdout fold.

What do RMSE and MAE measure, and when should each be used?

Both RMSE and MAE quantify prediction error for endogenous construct indicators by comparing actual values to predicted values; lower values indicate better predictive accuracy. RMSE is the default choice when the prediction error distribution is roughly symmetric. If the error distribution is highly non-symmetric with long left or right tails, MAE is preferred. SmartPLS histograms help diagnose this tail behavior.

What benchmark does PLS Predict use to judge whether predictions are genuinely better?

Predictions are compared to a naive linear regression benchmark. For each endogenous construct’s indicators, a linear regression is run with the exogenous construct indicators as predictors (mirroring the PLS path model’s structure). The resulting LM RMSE/MAE values are then compared to the PLS-SEM RMSE/MAE values; predictive power is assessed based on whether PLS-SEM errors are lower.

How are predictive power levels determined from the indicator-level comparisons?

The session uses a majority/minority rule across indicators. If most indicators have lower PLS-SEM prediction error than the LM benchmark, predictive power is medium. If only a minority improve, predictive power is low. If none of the indicators improve (PLS-SEM RMSE/MAE is higher for all indicators), predictive power is considered absent.

What conclusion is reached in the example run in SmartPLS?

SmartPLS histograms show no strong long left/right tails, so RMSE is used. In the prediction summary, 13 out of 24 indicators have higher RMSE under PLS-SEM than under the LM benchmark, meaning only the remaining indicators show improvement. Under the session’s guidelines, that pattern corresponds to low predictive power, even though Q² values are positive and described as moderate to substantial.

Review Questions

How does k-fold cross-validation in PLS Predict change the training and holdout samples across folds?
What histogram feature determines whether RMSE or MAE should be used?
What indicator-level comparison against the LM benchmark distinguishes medium predictive power from low predictive power?

Key Points

1
R² measures in-sample explanatory power and cannot establish out-of-sample predictive power.
2
PLS Predict estimates the PLS-SEM model on a training subset and evaluates predictions on a separate holdout subset.
3
SmartPLS’s PLS Predict uses k-fold cross-validation (default 10 folds) and can repeat the procedure to avoid extreme random partitions.
4
Prediction error is assessed per endogenous construct indicator using RMSE or MAE; lower error indicates better prediction.
5
RMSE is preferred when prediction error distributions are symmetric; MAE is preferred when errors show long left/right tails.
6
Predictive power is judged by comparing PLS-SEM prediction errors to a naive linear regression (LM) benchmark.
7
In the example, RMSE comparisons show only partial improvement across indicators, leading to a low predictive power classification.

Highlights

R² cannot validate forecasting ability because it does not test predictions on data excluded from model estimation.

PLS Predict operationalizes out-of-sample prediction by repeatedly training on k−1 folds and predicting the remaining fold.

RMSE vs MAE selection depends on whether prediction error histograms show long left/right tails.

Predictive validity is evaluated by beating a linear regression benchmark at the indicator level, not by looking at Q² or R² alone.

The example’s RMSE results (13 of 24 indicators worse than LM) support a low predictive power outcome.

Topics

PLS Predict
Predictive Validity
Predictive Power
RMSE vs MAE
k-Fold Cross Validation

Mentioned

SmartPLS
PLS
PLS-SEM
Q²
RMSE
MAE
LM
LM RMSE
LM MAE
HS 10