#SmartPLS4 Series - 40 - How to use Categorical Predictor variables?
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Dummy code categorical predictors in SmartPLS 4 so that each non-reference category is compared against a reference group.
Briefing
Categorical predictors in SmartPLS 4 work through dummy coding, and the choice of which category becomes the reference group directly flips coefficient signs and changes how results are interpreted. In the example, “type of bank” has two categories: conventional and Islamic. Conventional is coded as 0 and Islamic as 1, so SmartPLS needs only one dummy variable because the other category automatically serves as the reference. With Islamic coded as 1, the path coefficient for “type” comes out positive, meaning Islamic banks show a higher collaborative culture than conventional banks. The difference is also reported as significant, so the category effect is not just directional—it’s statistically meaningful.
That interpretation hinges on the reference category. When the coding is reversed—conventional as 1 and Islamic as 0—the coefficient sign flips to negative. In that setup, conventional banks are interpreted as having lower collaborative culture than Islamic banks. The underlying comparison remains the same (Islamic versus conventional), but the sign changes because the reference group changes.
The session then moves to a categorical predictor with three levels: “job rank,” split into junior, middle, and senior. With three categories, SmartPLS cannot use a single dummy variable; it requires two dummy variables, each compared against a chosen reference category. Junior is selected as the reference, so the model includes separate indicators for middle and senior. The results show that the coefficient for “middle” is negative, indicating middle-level employees perceive collaborative culture to be lower than junior-level employees. However, the difference is not significant, meaning the negative direction does not reach statistical confidence.
For “senior,” the coefficient is positive (no negative sign), so senior-level employees perceive collaborative culture as higher than junior-level employees. Yet again, the difference is not significant. Throughout, the model is assessing the impact of categorical predictors by comparing each non-reference category against the reference category, using coefficient signs to indicate direction and significance tests to determine whether the observed differences are reliable.
Finally, the “type of bank” effect is reiterated alongside the job-rank comparisons: Islamic banks (coded as 1) are associated with higher collaborative culture than conventional banks, and that difference is significant. Taken together, the examples show how to structure categorical predictors in SmartPLS 4—one dummy for two categories, two dummies for three categories—and how reference-category selection controls both interpretation and coefficient direction.
Cornell Notes
SmartPLS 4 handles categorical predictor variables by dummy coding and comparing categories to a reference group. For a two-category predictor like bank type (conventional vs Islamic), only one dummy variable is needed: code one category as 1 and the other as 0, with the 0 category acting as the reference. A positive coefficient means the 1-coded category has higher outcomes than the reference; reversing the coding flips the sign and the interpretation. For a three-category predictor like job rank (junior, middle, senior), two dummy variables are required when junior is the reference—middle and senior are each compared to junior. Coefficient signs indicate direction (higher or lower collaborative culture), while significance determines whether differences are statistically reliable.
Why does SmartPLS 4 require only one dummy variable for “type of bank” but two for “job rank”?
How does the reference category affect the sign of the path coefficient for a two-category predictor?
What does a positive vs negative coefficient mean for “job rank” when junior is the reference category?
How should significance be interpreted in these categorical comparisons?
What is the practical meaning of “category present” and “category absent” in dummy coding?
Review Questions
- If bank type is coded with conventional=1 and Islamic=0, what would a negative coefficient for “type” imply about collaborative culture?
- For a three-category predictor, what determines how many dummy variables are needed and which category becomes the reference?
- In the job-rank example, how do you interpret a negative coefficient for middle when junior is the reference, and what does non-significance change about that interpretation?
Key Points
- 1
Dummy code categorical predictors in SmartPLS 4 so that each non-reference category is compared against a reference group.
- 2
For a two-category predictor, include only one dummy variable; the 0/1 coding automatically sets the reference category.
- 3
A positive coefficient means the 1-coded category has higher outcomes than the reference category; reversing coding flips the sign.
- 4
For a three-category predictor, include two dummy variables when one category (e.g., junior) is chosen as the reference.
- 5
Coefficient sign shows direction of perceived differences (higher/lower collaborative culture), while significance determines whether the difference is statistically supported.
- 6
In the example, Islamic banks (coded as 1) show higher collaborative culture than conventional banks, and that difference is significant.
- 7
For job rank, middle and senior both differ directionally from junior (negative for middle, positive for senior) but neither difference is significant.