Get AI summaries of any video or article — Sign up free
#SmartPLS4 Series - 40 - How to use Categorical Predictor variables? thumbnail

#SmartPLS4 Series - 40 - How to use Categorical Predictor variables?

Research With Fawad·
4 min read

Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Dummy code categorical predictors in SmartPLS 4 so that each non-reference category is compared against a reference group.

Briefing

Categorical predictors in SmartPLS 4 work through dummy coding, and the choice of which category becomes the reference group directly flips coefficient signs and changes how results are interpreted. In the example, “type of bank” has two categories: conventional and Islamic. Conventional is coded as 0 and Islamic as 1, so SmartPLS needs only one dummy variable because the other category automatically serves as the reference. With Islamic coded as 1, the path coefficient for “type” comes out positive, meaning Islamic banks show a higher collaborative culture than conventional banks. The difference is also reported as significant, so the category effect is not just directional—it’s statistically meaningful.

That interpretation hinges on the reference category. When the coding is reversed—conventional as 1 and Islamic as 0—the coefficient sign flips to negative. In that setup, conventional banks are interpreted as having lower collaborative culture than Islamic banks. The underlying comparison remains the same (Islamic versus conventional), but the sign changes because the reference group changes.

The session then moves to a categorical predictor with three levels: “job rank,” split into junior, middle, and senior. With three categories, SmartPLS cannot use a single dummy variable; it requires two dummy variables, each compared against a chosen reference category. Junior is selected as the reference, so the model includes separate indicators for middle and senior. The results show that the coefficient for “middle” is negative, indicating middle-level employees perceive collaborative culture to be lower than junior-level employees. However, the difference is not significant, meaning the negative direction does not reach statistical confidence.

For “senior,” the coefficient is positive (no negative sign), so senior-level employees perceive collaborative culture as higher than junior-level employees. Yet again, the difference is not significant. Throughout, the model is assessing the impact of categorical predictors by comparing each non-reference category against the reference category, using coefficient signs to indicate direction and significance tests to determine whether the observed differences are reliable.

Finally, the “type of bank” effect is reiterated alongside the job-rank comparisons: Islamic banks (coded as 1) are associated with higher collaborative culture than conventional banks, and that difference is significant. Taken together, the examples show how to structure categorical predictors in SmartPLS 4—one dummy for two categories, two dummies for three categories—and how reference-category selection controls both interpretation and coefficient direction.

Cornell Notes

SmartPLS 4 handles categorical predictor variables by dummy coding and comparing categories to a reference group. For a two-category predictor like bank type (conventional vs Islamic), only one dummy variable is needed: code one category as 1 and the other as 0, with the 0 category acting as the reference. A positive coefficient means the 1-coded category has higher outcomes than the reference; reversing the coding flips the sign and the interpretation. For a three-category predictor like job rank (junior, middle, senior), two dummy variables are required when junior is the reference—middle and senior are each compared to junior. Coefficient signs indicate direction (higher or lower collaborative culture), while significance determines whether differences are statistically reliable.

Why does SmartPLS 4 require only one dummy variable for “type of bank” but two for “job rank”?

“Type of bank” has two categories (conventional, Islamic). With two categories, one dummy variable is enough because the other category becomes the reference automatically. “Job rank” has three categories (junior, middle, senior). With three categories, you need two dummy variables so that each non-reference category (middle and senior) can be compared against the chosen reference (junior).

How does the reference category affect the sign of the path coefficient for a two-category predictor?

When conventional is coded as 0 (reference) and Islamic as 1, the coefficient for “type” is positive, indicating Islamic banks have higher collaborative culture than conventional banks. If the coding is reversed—conventional as 1 and Islamic as 0—the coefficient becomes negative, meaning conventional banks have lower collaborative culture than Islamic banks. The sign flips because the reference group changes, even though the comparison is still between the two categories.

What does a positive vs negative coefficient mean for “job rank” when junior is the reference category?

With junior as the reference, the model compares middle and senior directly to junior. A negative coefficient for “middle” means middle-level employees perceive collaborative culture to be lower than junior-level employees. A positive coefficient for “senior” means senior-level employees perceive collaborative culture to be higher than junior-level employees. In both cases, the direction comes from the sign relative to the reference category.

How should significance be interpreted in these categorical comparisons?

Significance determines whether the observed difference between categories is statistically reliable. In the example, the Islamic vs conventional bank-type difference is significant. For job rank, the middle vs junior difference is not significant (despite a negative sign), and the senior vs junior difference is also not significant (despite a positive sign).

What is the practical meaning of “category present” and “category absent” in dummy coding?

Dummy coding uses 1 to mark the presence of a category and 0 to mark its absence. For “type of bank,” a respondent coded 1 belongs to Islamic and is coded 0 for conventional (the reference). For “job rank,” a respondent coded 1 for middle belongs to middle, while 0 indicates they are not middle; the reference category (junior) is handled by leaving it out and comparing other categories to it.

Review Questions

  1. If bank type is coded with conventional=1 and Islamic=0, what would a negative coefficient for “type” imply about collaborative culture?
  2. For a three-category predictor, what determines how many dummy variables are needed and which category becomes the reference?
  3. In the job-rank example, how do you interpret a negative coefficient for middle when junior is the reference, and what does non-significance change about that interpretation?

Key Points

  1. 1

    Dummy code categorical predictors in SmartPLS 4 so that each non-reference category is compared against a reference group.

  2. 2

    For a two-category predictor, include only one dummy variable; the 0/1 coding automatically sets the reference category.

  3. 3

    A positive coefficient means the 1-coded category has higher outcomes than the reference category; reversing coding flips the sign.

  4. 4

    For a three-category predictor, include two dummy variables when one category (e.g., junior) is chosen as the reference.

  5. 5

    Coefficient sign shows direction of perceived differences (higher/lower collaborative culture), while significance determines whether the difference is statistically supported.

  6. 6

    In the example, Islamic banks (coded as 1) show higher collaborative culture than conventional banks, and that difference is significant.

  7. 7

    For job rank, middle and senior both differ directionally from junior (negative for middle, positive for senior) but neither difference is significant.

Highlights

With two categories, one dummy variable is enough in SmartPLS 4; the reference category is the one coded as 0.
Changing which category is coded as 1 flips coefficient signs without changing the underlying comparison.
With three categories, two dummy variables are required, each compared to the chosen reference category (junior).
Islamic banks are associated with higher collaborative culture than conventional banks, and the difference is significant.

Topics

Mentioned

  • SmartPLS