Quick Guide to Assess Predictive Validity/Predictive Power using PLS Predict in SmartPLS3
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use PLS Predict in SmartPLS3 with Q²predict > 0 as the first gate for predictive relevance.
Briefing
Predictive validity in PLS-SEM can be judged with PLS Predict in SmartPLS3 by combining three checks: whether Q²predict is above zero, whether prediction errors look reasonably symmetric (not dominated by extreme tails), and whether PLS’s prediction errors beat a naive linear benchmark. The practical takeaway is straightforward: if PLS shows lower RMSE/MAE than the linear model across indicators, the model’s predictive power is high; if it performs worse for most indicators, predictive power is low.
The workflow starts with the core requirement—Q²predict must be greater than zero. That threshold signals that the model has predictive relevance rather than merely fitting the sample. Next comes a distributional sanity check on the residual prediction errors. Using the PLS Predict output for manifest variables, the residual error histograms should be roughly symmetric and should not show excessively long tails. If the tails are manageable, RMSE is the preferred metric; if the tails are too extreme, MAE can be used instead.
Once the error distribution looks acceptable, the decision hinges on comparing prediction errors against a naive linear model benchmark. In SmartPLS3, the PLS Predict results provide RMSE values (and related statistics) for manifest variables under both the PLS model and the linear model. The key comparison is RMSE from PLS (often labeled PLS-SEM prediction errors) versus RMSE from the linear model (LM benchmark). The transcript emphasizes that the RMSE/MAE comparison should be done indicator by indicator, not just at a global level.
A simple rule set ties the comparison to predictive power levels. If PLS’s RMSE values are higher than the linear model’s RMSE for most indicators, predictive power is low. If PLS’s RMSE is higher for only a minority of indicators, predictive power is medium. If PLS’s RMSE is lower than the linear benchmark for all indicators, predictive power is high. The same logic applies when using MAE if residual tails are problematic.
In the example workflow, the user runs PLS Predict with “number of repetitions” set to 10 and then focuses on the manifest variable prediction summary. The process involves inspecting the PLS MV error histogram for skewness and tail length, then comparing the RMSE values for each manifest variable between PLS and LM. To make the comparison easy, the transcript describes exporting the PLS and LM RMSE tables into Excel and computing a difference column (PLS RMSE minus LM RMSE). Negative differences indicate that the linear model has higher RMSE, meaning PLS is producing lower prediction error and therefore stronger predictive performance.
Overall, the method turns predictive validity into a concrete, repeatable checklist: confirm Q²predict > 0, verify residual error symmetry, and verify that PLS’s RMSE (or MAE) beats the naive linear benchmark across indicators. That combination is what supports a conclusion about predictive power in SmartPLS3 using PLS Predict.
Cornell Notes
PLS Predict in SmartPLS3 assesses predictive validity by checking Q²predict, the shape of residual prediction errors, and—most importantly—whether PLS produces lower prediction errors than a naive linear benchmark. The transcript’s decision logic starts with Q²predict > 0, then uses residual error histograms to confirm errors are not highly asymmetric or dominated by extreme tails. If tails look reasonable, RMSE is used; otherwise MAE can be substituted. Predictive power is then classified by comparing PLS RMSE (or MAE) to linear model RMSE for each manifest indicator: higher PLS errors for most indicators imply low predictive power, mixed results imply medium, and lower PLS errors for all indicators imply high predictive power. A practical Excel step computes PLS RMSE minus LM RMSE to quickly spot where PLS wins.
Why does Q²predict need to be greater than zero before trusting predictive validity results?
What does the residual error histogram check accomplish, and what counts as “acceptable” shape?
How is predictive power determined using RMSE in PLS Predict?
What role does the naive linear model benchmark play in the comparison?
How can an Excel calculation make the PLS vs LM RMSE comparison faster?
Review Questions
- What three checks are combined in PLS Predict to assess predictive validity, and which one is the threshold requirement?
- If PLS RMSE is higher than the linear model’s RMSE for most indicators, what predictive power level should be concluded?
- When would MAE be preferred over RMSE in this workflow?
Key Points
- 1
Use PLS Predict in SmartPLS3 with Q²predict > 0 as the first gate for predictive relevance.
- 2
Inspect the residual prediction error histograms for manifest variables to confirm errors are not highly asymmetric and tails are not extreme.
- 3
Prefer RMSE when residual tails are reasonably behaved; switch to MAE if tails are too long.
- 4
Compare PLS RMSE (or MAE) against the naive linear model RMSE for each manifest indicator, not just overall fit.
- 5
Classify predictive power by how often PLS beats the linear benchmark: most indicators worse = low, minority worse = medium, all indicators better = high.
- 6
A simple Excel difference (PLS RMSE − LM RMSE) quickly identifies indicators where PLS reduces prediction error (negative values indicate PLS wins).