How to Add Control Variables in SmartPLS3? (See Description)
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Add control variables in SmartPLS3 as latent variables, then run bootstrapping to test their significance.
Briefing
Adding control variables in SmartPLS3 can change whether demographic predictors look statistically significant—and that shift matters because it reveals potential confounding. In this walkthrough, gender and age are introduced as control variables to test whether they distort the relationships among the model’s endogenous constructs, specifically OC and CC (collaborative culture). Gender is treated as a categorical variable with two categories (male, female), so it’s entered directly as a latent variable without creating dummy variables. Age is treated as a continuous variable, also added as a latent variable, then the model is bootstrapped to assess significance.
With gender and age included, the results show no effect from gender, while age produces a significant effect. To understand whether age is acting as a confounder, the analysis is repeated without any control variables. The comparison is done using R-square values and the significance of the paths. When controls are removed, the R-square for OC drops slightly (from 0.495 with controls to 0.482 without), indicating that age contributes explanatory power. More importantly, age’s significance persists: age remains influential even when it is not included as a control variable. That persistence suggests age is not merely a nuisance variable—it genuinely affects the endogenous outcome, so it carries confounding influence.
The walkthrough then compares how the model’s key statistics behave when controls are added versus removed. Even when the beta coefficients and overall relationships remain broadly similar, the p-values can change. The takeaway is nuanced: significance levels may shift with or without controls, but the substantive effect of age on the endogenous construct remains. In other words, adding age as a control variable alters statistical signaling more than it alters the underlying direction of relationships.
Finally, the tutorial addresses a common complication: control variables with more than two categories. Using job rank as the example, it explains that SmartPLS requires dummy-variable coding for multi-category categorical predictors. For job rank with three levels—Junior, Middle, and Senior—three dummy indicators are conceptually created, but only two are actually added to the model so one category serves as the reference group (Junior). Middle is coded as 1 for Middle employees and 0 otherwise; Senior is coded as 1 for Senior employees and 0 otherwise. After bootstrapping with these dummy controls included, the results show no significant impact from Middle or Senior job rank on the endogenous variables. That means job rank does not produce a confounding effect, so there’s no strong statistical reason to keep it as a control.
Overall, the process is practical: add candidate controls in SmartPLS3, bootstrap to test significance, rerun the model without controls, and compare R-square and path significance to judge whether the control variable changes the relationships in a meaningful way. The tutorial’s examples show both a case where age behaves like a confounder and a case where job rank does not.
Cornell Notes
SmartPLS3 control variables can be added as latent variables and then tested for confounding by comparing results with and without the controls. In the example, gender (male/female) is entered without dummy coding and shows no significant effect, while age (continuous) is significant when included. Removing controls slightly lowers OC’s R-square (0.495 to 0.482) and keeps age’s influence, indicating age affects the endogenous construct and acts as a confounder. For multi-category controls like job rank (Junior/Middle/Senior), dummy variables are created and only two are added, using Junior as the reference category. Bootstrapping then shows no significant effect from Middle or Senior, so job rank is not a confounding factor.
How does the walkthrough decide whether gender and age should be treated differently when adding them as control variables in SmartPLS3?
What comparison is used to judge whether age is a confounder?
Why does the walkthrough emphasize that p-values can change even if beta values and relationships look similar?
How are dummy variables created for a control variable with more than two categories (job rank)?
What does it mean in this context when job rank dummies show no significant effects after bootstrapping?
Review Questions
- When comparing models with and without control variables in SmartPLS3, which metrics does the walkthrough use to assess confounding (and why)?
- How would you code a categorical control variable with three categories so that one category becomes the reference group in SmartPLS3?
- In the age example, what evidence suggests age is more than a statistical artifact when controls are removed?
Key Points
- 1
Add control variables in SmartPLS3 as latent variables, then run bootstrapping to test their significance.
- 2
Treat two-category categorical controls (e.g., gender) without dummy variables when entering them as latent variables.
- 3
Treat continuous controls (e.g., age) as continuous latent variables and compare results with and without them.
- 4
Assess confounding by rerunning the model without controls and comparing R-square and path significance.
- 5
Expect p-values to change when controls are added/removed, but judge confounding by whether substantive relationships meaningfully change.
- 6
For multi-category categorical controls (e.g., job rank), create dummy variables and add only k−1 dummies, using one category as the reference group.
- 7
If dummy-coded controls (excluding the reference) are insignificant, there’s little evidence they confound the endogenous relationships.