Chi-Square Test

TL;DR

Chi-square test of association (independence) is designed for two nominal categorical variables to test whether their category distributions are related.

Briefing Cornell Notes

Briefing

Chi-square test of association (also called chi-square test of independence or Pearson’s chi-square test) is used to check whether two categorical variables measured on a nominal scale are related. The key idea is simple: when categories like “introvert/extrovert” or “red/yellow/green/blue” have no inherent order, chi-square can test whether the distribution of one variable differs across the categories of the other. It’s also often interpreted as asking whether there’s a statistically meaningful difference between the variables’ category patterns.

The transcript lays out when this test fits real research questions. Examples include whether gender is associated with preferred learning method (textbook reading vs class discussion), whether personality type (introvert/extrovert) is associated with color preference, whether car make is associated with gender, and whether a watch brand is associated with gender. In each case, both variables are categorical and nominal—respondents are simply classified into groups.

A worked example tests association between personality and color preference. Personality has two categories: introvert (coded as 1) and extrovert (coded as 2). Color preference has four categories: red, yellow, green, and blue. The analysis is performed using cross-tabs: personality is placed in rows and preference in columns, then chi-square is selected under statistics. The output reports 150 respondents with no missing values.

The cross-tab counts show how preferences distribute within each personality group. Among introverts, 13 preferred red, 15 yellow, 29 green, and 13 blue (70 introverts total). Among extroverts, 9 preferred red, 29 yellow, 29 green, and 13 blue (80 extroverts total). To determine whether these differences reflect a real association rather than random variation, the chi-square test uses the chi-square statistic and its p-value.

The results show a chi-square value of 4.53 with degrees of freedom of 3. The p-value is 0.29, which is greater than the 0.05 significance threshold. That leads to the conclusion that there is no significant association between personality and color preference at the 5% level. The transcript also checks an important assumption: expected cell counts should not be too small. Here, 0% of cells have expected counts less than five, and the minimum expected count is 10.27—comfortably above the usual cutoff—so the chi-square approximation is considered acceptable.

For reporting, the transcript provides a template-style sentence: chi-square statistics were used to examine association between the categorical variables, and because the relationship is insignificant, the result is stated without parenthetical “insignificant” phrasing. The final conclusion is that H1 is not supported: personality and color preference are not statistically associated in this dataset at the 5% significance level.

Cornell Notes

Chi-square test of association (chi-square test of independence) checks whether two nominal categorical variables are related. It’s appropriate when categories have no order—such as introvert/extrovert versus red/yellow/green/blue. In the example, cross-tabs are built with personality in rows and color preference in columns, then chi-square is computed. The output gives χ² = 4.53, df = 3, and p = 0.29, which is above 0.05, so the association is not significant. The analysis also verifies assumptions: 0% of cells have expected counts below 5, with a minimum expected count of 10.27, supporting the validity of the chi-square test.

When should a chi-square test of association be used instead of other tests?

Use it when both variables are categorical and measured on a nominal scale (no natural ordering). The goal is to test whether the category distribution of one variable differs across the categories of the other—often framed as testing independence. Examples given include gender vs preferred learning method, personality vs color preference, car make vs gender, and watch brand vs gender.

How does the example set up the chi-square test for personality and color preference?

Personality is coded into two categories (introvert vs extrovert) and placed in rows, while color preference has four categories (red, yellow, green, blue) and is placed in columns. The analysis is run via cross-tabs, selecting chi-square under statistics. The dataset includes 150 respondents and reports no missing values.

What do the cross-tab counts reveal, and why aren’t they enough on their own?

The cross-tab shows how preferences split within each personality group—for instance, among introverts: 13 red, 15 yellow, 29 green, 13 blue; among extroverts: 9 red, 29 yellow, 29 green, 13 blue. But counts alone can reflect random sampling variation, so the chi-square statistic and p-value are needed to judge whether the observed differences are statistically meaningful.

How are the decision criteria applied in the example?

The chi-square output reports χ² = 4.53 with df = 3 and p = 0.29. Since p = 0.29 is greater than the 0.05 significance level, the result is not significant at the 5% level. The conclusion is that there is no significant association between personality and color preference, so H1 is not supported.

What assumption about expected counts is checked, and what were the results here?

A common chi-square condition is that expected cell counts should not be too small (often requiring that no more than a small fraction fall below 5). In this example, 0% of cells have expected counts less than five, and the minimum expected count is 10.27, so the expected-count assumption is satisfied.

What is a clear way to report the findings from the chi-square test?

A reporting sentence can state that chi-square statistics were used to examine association between the categorical variables and that the relationship was insignificant at the 5% significance level. The transcript’s specific conclusion is that H1 was not supported, using χ² = 4.53, df = 3, and p = 0.29.

Review Questions

What makes a variable “nominal” and why does that matter for choosing the chi-square test of association?
In the example, why does a p-value of 0.29 lead to concluding no significant association at the 5% level?
What expected-count check is performed for chi-square validity, and how would you interpret a case where many cells have expected counts below 5?

Key Points

1
Chi-square test of association (independence) is designed for two nominal categorical variables to test whether their category distributions are related.
2
It’s appropriate for questions like gender vs learning method, personality vs color preference, and brand vs gender when both variables are categorical.
3
Run the test using cross-tabs: place one categorical variable in rows and the other in columns, then select chi-square under statistics.
4
Use the chi-square statistic with degrees of freedom and the p-value to decide significance against a chosen threshold (commonly 0.05).
5
Check expected cell counts: ensure expected counts are not too small (the example reports 0% below 5 and a minimum expected count of 10.27).
6
Report results clearly with χ², df, and p, and state whether H1 is supported based on whether p is below the significance level.

Highlights

Chi-square test of association is the go-to method for nominal categorical variables when the question is whether two variables are related or independent.

In the worked example, χ² = 4.53 (df = 3) with p = 0.29, so personality and color preference show no significant association at the 5% level.

Expected counts matter: the example confirms validity with 0% of cells having expected counts below five and a minimum expected count of 10.27.

Topics

Chi-Square Test
Association
Independence
Cross-Tabulation
Expected Counts