23. SEMinR Lecture Series | Step 4: Out of Sample Predictive Power | How to use PLSPredict
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Predictive power should be assessed out of sample; R² only measures in-sample explanatory power.
Briefing
Relying on R² to judge predictive power can mislead—PLS predict is built to measure how well a PLS path model forecasts unseen data using out-of-sample evaluation. Instead of treating in-sample fit as prediction quality, the method splits the dataset into a training portion (used to estimate model parameters) and a holdout portion (used to test predictions). The holdout data is never used during estimation, so the resulting prediction errors reflect genuine forecasting ability on new observations.
PLS predict operationalizes this idea through k-fold cross-validation. The full sample is divided into k roughly equal folds; in each fold, one subset becomes the holdout sample while the remaining folds form the training sample. Predictions are generated for the holdout observations using the model estimated on the training data, and prediction errors are computed by comparing predicted values to actual values in the holdout set. This process repeats across all folds so every observation serves as holdout exactly once (or approximately, depending on how the split lands). Running multiple folds helps avoid “abnormal solutions” that could arise from a single arbitrary train/test split.
Prediction quality is quantified using out-of-sample error metrics computed on the holdout predictions. The core idea is that “error” here means residuals—the difference between actual and predicted values—not a mistake. The most common metric is RMSE (root mean square error), but MAE (mean absolute error) can be more appropriate when the distribution of prediction errors is highly skewed (e.g., when residuals show a long left or right tail). To decide between RMSE and MAE, the workflow checks the error distribution via plots for the endogenous indicators; if skewness is modest, RMSE is typically used.
To interpret the magnitude of these errors, PLS predict compares them against a naive benchmark: a linear model (LM) baseline. The LM benchmark is obtained by running linear regression for each dependent construct indicator on the indicators of the exogenous constructs in the PLS model. Predictive power then follows a guideline based on how many indicators have lower out-of-sample RMSE/MAE than the LM benchmark: if all indicators beat the benchmark, predictive power is high; if most do, it’s medium; if only a minority do, it’s low; and if none do, the model lacks predictive power.
Models with mediator constructs add another decision point: predictions can be generated using either a direct antecedence (DA) approach or an earliest antecedence (EA) approach. DA treats both antecedents and the mediator as predictors of the outcome, while EA excludes the mediator from the prediction step. Simulation evidence cited in the lecture favors DA for higher predictive accuracy.
In the R workflow, PLS predict is run with the estimated PLS model, a chosen number of folds (default depths of 10 are mentioned), and repetitions. The results are stored in a summary object, and the out-of-sample matrices are compared to the LM out-of-sample benchmark. In the example results, each indicator’s out-of-sample RMSE under the PLS model is lower than the corresponding LM benchmark value, leading to the conclusion that the model has high out-of-sample predictive power.
Cornell Notes
The lecture distinguishes in-sample fit from true predictive power and introduces PLS predict as an out-of-sample evaluation method for PLS path models. Instead of using R², it estimates the model on a training sample and tests predictions on a holdout sample that is excluded from estimation. k-fold cross-validation implements this by repeatedly splitting the data into training and holdout folds, producing out-of-sample prediction errors for each endogenous indicator. Those errors are summarized with RMSE or MAE (chosen based on whether prediction-error residuals are skewed) and compared against an LM benchmark baseline. If all indicators have lower out-of-sample errors than the LM benchmark, predictive power is classified as high.
Why is R² not a reliable measure of predictive power in PLS path modeling?
How does PLS predict create training and holdout samples?
What do RMSE and MAE measure in this context, and when should MAE be preferred?
How is predictive power determined using the LM benchmark?
What changes when the PLS path model includes a mediator construct, and why does DA matter?
What does the R implementation of PLS predict require, and what outputs are used for the decision?
Review Questions
- In what way does out-of-sample evaluation address the shortcomings of using R² for predictive power?
- How do you decide between RMSE and MAE when using PLS predict?
- What rule determines whether predictive power is high, medium, low, or absent when comparing PLS errors to the LM benchmark?
Key Points
- 1
Predictive power should be assessed out of sample; R² only measures in-sample explanatory power.
- 2
PLS predict estimates the model on a training sample and evaluates predictions on a holdout sample that is excluded from estimation.
- 3
k-fold cross-validation implements the training/holdout split by rotating which fold serves as the holdout set.
- 4
Use RMSE by default, but switch to MAE when prediction-error residuals are highly skewed (long left/right tails).
- 5
Compare each endogenous indicator’s out-of-sample RMSE/MAE against an LM benchmark baseline to classify predictive power.
- 6
When mediators are present, generate predictions using the direct antecedence (DA) approach for higher accuracy than earliest antecedence (EA).
- 7
In R, run PLS predict with the estimated PLS model, DA technique, chosen folds, and repetitions, then compare the out-of-sample error matrices to LM benchmarks.