Quick Guide to Assess Predictive Validity/Predictive Power using PLS Predict in SmartPLS3

TL;DR

Use PLS Predict in SmartPLS3 with Q²predict > 0 as the first gate for predictive relevance.

Briefing Cornell Notes

Briefing

Predictive validity in PLS-SEM can be judged with PLS Predict in SmartPLS3 by combining three checks: whether Q²predict is above zero, whether prediction errors look reasonably symmetric (not dominated by extreme tails), and whether PLS’s prediction errors beat a naive linear benchmark. The practical takeaway is straightforward: if PLS shows lower RMSE/MAE than the linear model across indicators, the model’s predictive power is high; if it performs worse for most indicators, predictive power is low.

The workflow starts with the core requirement—Q²predict must be greater than zero. That threshold signals that the model has predictive relevance rather than merely fitting the sample. Next comes a distributional sanity check on the residual prediction errors. Using the PLS Predict output for manifest variables, the residual error histograms should be roughly symmetric and should not show excessively long tails. If the tails are manageable, RMSE is the preferred metric; if the tails are too extreme, MAE can be used instead.

Once the error distribution looks acceptable, the decision hinges on comparing prediction errors against a naive linear model benchmark. In SmartPLS3, the PLS Predict results provide RMSE values (and related statistics) for manifest variables under both the PLS model and the linear model. The key comparison is RMSE from PLS (often labeled PLS-SEM prediction errors) versus RMSE from the linear model (LM benchmark). The transcript emphasizes that the RMSE/MAE comparison should be done indicator by indicator, not just at a global level.

A simple rule set ties the comparison to predictive power levels. If PLS’s RMSE values are higher than the linear model’s RMSE for most indicators, predictive power is low. If PLS’s RMSE is higher for only a minority of indicators, predictive power is medium. If PLS’s RMSE is lower than the linear benchmark for all indicators, predictive power is high. The same logic applies when using MAE if residual tails are problematic.

In the example workflow, the user runs PLS Predict with “number of repetitions” set to 10 and then focuses on the manifest variable prediction summary. The process involves inspecting the PLS MV error histogram for skewness and tail length, then comparing the RMSE values for each manifest variable between PLS and LM. To make the comparison easy, the transcript describes exporting the PLS and LM RMSE tables into Excel and computing a difference column (PLS RMSE minus LM RMSE). Negative differences indicate that the linear model has higher RMSE, meaning PLS is producing lower prediction error and therefore stronger predictive performance.

Overall, the method turns predictive validity into a concrete, repeatable checklist: confirm Q²predict > 0, verify residual error symmetry, and verify that PLS’s RMSE (or MAE) beats the naive linear benchmark across indicators. That combination is what supports a conclusion about predictive power in SmartPLS3 using PLS Predict.

Cornell Notes

PLS Predict in SmartPLS3 assesses predictive validity by checking Q²predict, the shape of residual prediction errors, and—most importantly—whether PLS produces lower prediction errors than a naive linear benchmark. The transcript’s decision logic starts with Q²predict > 0, then uses residual error histograms to confirm errors are not highly asymmetric or dominated by extreme tails. If tails look reasonable, RMSE is used; otherwise MAE can be substituted. Predictive power is then classified by comparing PLS RMSE (or MAE) to linear model RMSE for each manifest indicator: higher PLS errors for most indicators imply low predictive power, mixed results imply medium, and lower PLS errors for all indicators imply high predictive power. A practical Excel step computes PLS RMSE minus LM RMSE to quickly spot where PLS wins.

Why does Q²predict need to be greater than zero before trusting predictive validity results?

Q²predict > 0 is treated as the baseline requirement for predictive relevance. It indicates the model’s predictions are better than a no-predicting benchmark; without it, later checks on error distributions and RMSE comparisons don’t carry much weight for concluding predictive validity.

What does the residual error histogram check accomplish, and what counts as “acceptable” shape?

The histogram check looks for highly skewed residual prediction errors and for overly long tails. The transcript’s guidance is to ensure errors are roughly symmetric and that tails are not extreme (not heavily left- or right-tailed). If the tails are manageable, RMSE is appropriate; if tails are too long, MAE is suggested as a more robust alternative.

How is predictive power determined using RMSE in PLS Predict?

Predictive power hinges on comparing RMSE from the PLS model against the RMSE from the naive linear model benchmark for each manifest indicator. If PLS RMSE is higher than the linear model’s RMSE for most indicators, predictive power is low; if it’s higher for only a minority, predictive power is medium; if PLS RMSE is lower for all indicators, predictive power is high.

What role does the naive linear model benchmark play in the comparison?

The naive linear model provides a baseline for prediction error. The method doesn’t just ask whether PLS has low error—it asks whether PLS improves on a simple linear approach. The transcript repeatedly frames the conclusion around whether PLS’s RMSE/MAE is lower than the LM benchmark.

How can an Excel calculation make the PLS vs LM RMSE comparison faster?

After exporting/copying the PLS and LM RMSE tables, compute a difference column: PLS RMSE minus LM RMSE. A negative result means LM has the higher RMSE, so PLS has the lower prediction error for that indicator—evidence of stronger predictive power.

Review Questions

What three checks are combined in PLS Predict to assess predictive validity, and which one is the threshold requirement?
If PLS RMSE is higher than the linear model’s RMSE for most indicators, what predictive power level should be concluded?
When would MAE be preferred over RMSE in this workflow?

Key Points

1
Use PLS Predict in SmartPLS3 with Q²predict > 0 as the first gate for predictive relevance.
2
Inspect the residual prediction error histograms for manifest variables to confirm errors are not highly asymmetric and tails are not extreme.
3
Prefer RMSE when residual tails are reasonably behaved; switch to MAE if tails are too long.
4
Compare PLS RMSE (or MAE) against the naive linear model RMSE for each manifest indicator, not just overall fit.
5
Classify predictive power by how often PLS beats the linear benchmark: most indicators worse = low, minority worse = medium, all indicators better = high.
6
A simple Excel difference (PLS RMSE − LM RMSE) quickly identifies indicators where PLS reduces prediction error (negative values indicate PLS wins).

Highlights

Predictive validity in this workflow is a checklist: Q²predict > 0, symmetric-ish residual errors, and PLS beating the linear RMSE benchmark.

RMSE comparison is indicator-by-indicator: PLS RMSE higher than the linear model for most indicators signals low predictive power.

If PLS RMSE is lower than the linear benchmark for every indicator, predictive power is treated as high.

Excel can streamline interpretation by computing PLS RMSE minus LM RMSE to flag where PLS reduces error. 

Topics

PLS Predict
Predictive Validity
SmartPLS3
RMSE Benchmark
Q²predict

Mentioned

SmartPLS3
PLS
Q²predict
RMSE
MAE
LM
MV
PLS SCM