Get AI summaries of any video or article — Sign up free
#SmartPLS4 Series 34 - Quick Guide: Assess Predictive Validity/Predictive Power using PLS Predict? thumbnail

#SmartPLS4 Series 34 - Quick Guide: Assess Predictive Validity/Predictive Power using PLS Predict?

Research With Fawad·
5 min read

Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Prediction error in PLS-SEM is the residual gap between observed and predicted endogenous values, and lower values indicate better predictive performance.

Briefing

Assessing a structural model’s predictive power in PLS-SEM comes down to comparing how closely predicted values match actual values for each endogenous construct. The key idea is that “prediction error” isn’t treated as a mistake; it’s the residual—the gap between observed and predicted scores. Lower residual-based error means stronger predictive performance, and researchers quantify that error using prediction statistics such as RMSE (root mean square error) or MAE (mean absolute error).

RMSE is the default choice in most cases because it penalizes larger deviations more heavily. But when the prediction error distribution is highly non-symmetric—showing a long left or right tail—MAE becomes more appropriate, since it better reflects average absolute deviations under skewed error patterns. SmartPLS4’s PLS Predict workflow operationalizes this by generating prediction errors for endogenous constructs and then comparing them against a naive linear model (LM) benchmark.

The decision rules hinge on how many indicators (manifest variables) produce lower prediction errors than the LM benchmark. If all indicators—or at least a clear majority—show low prediction errors relative to LM, the model’s predictive power is rated high or medium. If only a minority of indicators outperform the LM benchmark, predictive power is rated low. The strictest “no predictive power” outcome occurs when none of the indicators achieve lower prediction error than the LM benchmark—meaning every endogenous indicator’s RMSE/MAE is higher than the linear model’s.

In the SmartPLS4 example, the workflow starts with running “Calculate PLS Predict,” using default settings (including a repetition count of 10). The output is then checked for whether the prediction error distribution is symmetric. By inspecting the PLS Predict error histogram for the endogenous variables, the error pattern appears balanced, with no extreme long-tail behavior. That symmetry leads to using RMSE rather than MAE.

Next, the report compares RMSE values for each endogenous indicator against the corresponding LM RMSE values. The results show that many indicators have RMSE values lower than the LM benchmark, but not enough to qualify as strong predictive performance. Specifically, out of 24 indicators, 13 have higher RMSE than the LM benchmark, while the remaining indicators perform better. Under the stated guidelines, that pattern—where a minority of indicators outperform the benchmark and a substantial portion do not—maps to a “low predictive power” conclusion. In short: the model predicts some indicators better than a naive linear baseline, but too many indicators underperform, limiting overall predictive usefulness.

The practical takeaway is straightforward: use PLS Predict in SmartPLS4, choose RMSE when the error distribution is symmetric (MAE when it isn’t), and classify predictive power by counting how many endogenous indicators beat the LM benchmark on the selected error metric.

Cornell Notes

Predictive validity/power in PLS-SEM is assessed by measuring residual-based prediction error for endogenous constructs and comparing it to a naive linear model (LM) benchmark. Prediction error is quantified with RMSE by default; switch to MAE when the prediction error distribution is highly non-symmetric with long left/right tails. SmartPLS4’s PLS Predict workflow generates error histograms to check symmetry and then reports RMSE (or MAE) for each indicator alongside LM RMSE. Predictive power is classified by how many indicators have lower prediction error than the LM benchmark: majority/ all implies high or medium predictive power, minority implies low predictive power, and none implies no predictive power. In the example, RMSE is used after confirming symmetric error, and 13 of 24 indicators have higher RMSE than LM, leading to a low predictive power rating.

Why does prediction error matter in PLS-SEM, and what does “error” mean here?

Prediction error refers to residuals: the difference between actual (observed) values and predicted values for endogenous constructs. It’s not treated as a coding mistake; it’s the size of the gap the model leaves behind. The goal is to minimize this gap, so lower prediction error indicates better predictive performance.

How do RMSE and MAE differ in practice for choosing a metric?

RMSE (root mean square error) is the most common choice because it emphasizes larger deviations. MAE (mean absolute error) is recommended when the prediction error distribution is highly non-symmetric—when there’s a long left or right tail—because MAE reflects average absolute deviations more robustly under skewed error patterns.

What does the LM benchmark comparison accomplish?

The LM benchmark provides a baseline expectation from a naive linear model. Predictive power is judged by whether PLS-SEM indicators produce lower prediction error than this baseline. If indicators beat LM on RMSE/MAE, the model adds predictive value; if they don’t, the model’s predictive usefulness is limited.

What rule determines whether predictive power is high, medium, low, or none?

The classification depends on how many endogenous indicators have lower prediction error than LM. If all indicators (or a majority) have lower errors, predictive power is high (or medium). If only a minority have lower errors, predictive power is low. If none of the indicators beat LM (all have higher RMSE/MAE), predictive power is “no predictive power.”

How was RMSE selected in the SmartPLS4 example?

The workflow used the PLS Predict error histogram to inspect the distribution shape for endogenous variables. The histogram looked symmetric, without long left/right tails, so RMSE was chosen instead of MAE.

What specific result led to the “low predictive power” conclusion?

After running PLS Predict and comparing RMSE to LM RMSE, the example reported 24 indicators total. Thirteen of those indicators had higher RMSE than the LM benchmark, meaning only the remaining indicators performed better than the naive baseline. That minority-better pattern aligns with the guideline for low predictive power.

Review Questions

  1. When should MAE be used instead of RMSE in PLS Predict, and what feature of the error distribution triggers the switch?
  2. How does the count of indicators beating the LM benchmark determine the predictive power category?
  3. In the example, why does having 13 indicators with higher RMSE than LM lead to a “low predictive power” rating?

Key Points

  1. 1

    Prediction error in PLS-SEM is the residual gap between observed and predicted endogenous values, and lower values indicate better predictive performance.

  2. 2

    RMSE is the default prediction error metric; MAE is preferred when prediction error distributions show long left/right tails (high non-symmetry).

  3. 3

    SmartPLS4’s PLS Predict uses error histograms to check whether the prediction error distribution is symmetric before choosing RMSE vs MAE.

  4. 4

    Predictive power is classified by comparing each endogenous indicator’s prediction error to the naive LM benchmark.

  5. 5

    If most or all indicators have lower prediction error than LM, predictive power is high (or medium); if only a minority do, predictive power is low.

  6. 6

    If none of the indicators beat LM on RMSE/MAE, the model has no predictive power.

  7. 7

    In the example, RMSE was used after confirming symmetric error, and 13 of 24 indicators had higher RMSE than LM, resulting in low predictive power.

Highlights

Prediction error is treated as residuals (observed minus predicted), and the model’s predictive strength depends on minimizing that residual gap.
RMSE is used when prediction errors look symmetric; MAE is used when errors show long left/right tails.
Predictive power is determined by how many endogenous indicators outperform the LM benchmark on RMSE/MAE.
In the SmartPLS4 case, 13 of 24 indicators had higher RMSE than LM, so predictive power was rated low.

Mentioned