06. SPSS Classroom | Chi Square test of Independence - Analyze, Interpret, and Report Chi Square

TL;DR

Chi-square test of independence is designed for nominal or ordinal categorical variables using contingency tables and frequency-based comparisons.

Briefing Cornell Notes

Briefing

Chi-square test of independence is the go-to method for checking whether two categorical variables move together—or whether their relationship is just random chance. It’s especially useful when the data are nominal or ordinal, where averages and other descriptive statistics become meaningless. Instead, researchers rely on contingency tables and compare observed counts in each cell against expected counts calculated under the assumption of independence.

The core logic is straightforward: observed cell frequencies are the actual numbers collected for each category combination, while expected cell frequencies represent what the counts would look like if there were no association between the variables. The test statistic (chi-square) measures how far observed frequencies deviate from expected frequencies. A small chi-square value supports the null hypothesis of independence; a larger chi-square value signals that the variables are associated. Degrees of freedom are determined from the contingency table dimensions using (rows − 1) × (columns − 1), and the resulting chi-square value is evaluated against significance levels to decide whether to reject the null.

The transcript walks through practical examples of where this approach fits: whether business performance categories (loss, break-even, profit) depend on a country’s income group; whether employee satisfaction levels (e.g., 1 to 3) depend on job placement (local vs international); and whether personality type (introvert vs extrovert) is associated with color preference (red, yellow, green, blue). In each case, the variables are categorical, and the analysis hinges on contingency tables rather than means.

A worked example uses a table of introverts and extroverts against four color preferences. For instance, 13 introverts chose red, 15 chose yellow, 29 chose green, and 13 chose blue, with totals of 70 introverts, 80 extroverts, and 150 respondents overall. Expected counts are computed as (row total × column total) / grand total. For the red–introvert cell, the expected count is 70 × 22 / 150 ≈ 10.3. This expected-vs-observed comparison is repeated across all cells to drive the chi-square statistic.

The workflow in SPSS is then laid out: use Analyze → Descriptive Statistics → Crosstabs, place the categorical variables into rows and columns (with optional layering for multi-group comparisons), and select Statistics → Chi-square. The output includes the chi-square statistic, degrees of freedom, and a p-value. In the example, the p-value is greater than 0.05, leading to the conclusion that there is no significant association between personality and color preference at the 5% level.

Finally, the transcript emphasizes an important assumption check: chi-square isn’t suitable when any cell has fewer than five cases. If that happens, an alternative like Fisher’s exact test is recommended. In the example, Fisher’s exact test also yields a p-value above 0.05, reinforcing the same conclusion. Reporting guidance follows: state the variables, the hypotheses (H1 vs null), and the chi-square (or Fisher’s exact) results including degrees of freedom and p-value, concluding whether H1 is supported.

Cornell Notes

Chi-square test of independence checks whether two categorical variables are independent or associated. It compares observed cell frequencies from the data with expected cell frequencies computed as (row total × column total) / grand total, using a chi-square statistic and degrees of freedom (rows − 1) × (columns − 1). A p-value above the chosen significance level (commonly 0.05) means there’s no evidence of an association. The method works for nominal or ordinal categorical variables, but it requires adequate cell counts—if any expected/observed cell count is below 5, Fisher’s exact test is preferred. In the SPSS example, both chi-square and Fisher’s exact tests produce p-values above 0.05, so independence is not rejected.

Why can’t researchers rely on means for nominal or ordinal categorical data?

Nominal and ordinal categories don’t have a meaningful numeric scale where averaging makes sense. For example, religion is nominal (no inherent order), and satisfaction categories like 1–3 are ordinal (order matters, but differences aren’t necessarily equal). The transcript stresses that the meaningful summaries are frequencies and percentages, organized in contingency tables.

What exactly distinguishes observed cell frequencies from expected cell frequencies in a chi-square test?

Observed cell frequencies are the actual counts collected for each category combination in the contingency table. Expected cell frequencies are the counts that would occur if the two variables were independent. The expected count for a cell is computed as (row total × column total) / grand total. In the example, the red–introvert expected count is 70 × 22 / 150 ≈ 10.3, compared with the observed 13.

How does the chi-square statistic connect to the null hypothesis of independence?

The chi-square statistic quantifies the overall discrepancy between observed and expected counts. When the observed frequencies closely match expected frequencies, the chi-square value is small and supports the null hypothesis that the variables are independent. When observed frequencies deviate substantially from expected frequencies, the chi-square value grows, indicating evidence of association.

How are degrees of freedom determined for a contingency table?

Degrees of freedom depend on the table’s size: (number of rows − 1) × (number of columns − 1). The transcript notes that SPSS provides the chi-square statistic and degrees of freedom, which are then used with the significance level to evaluate the p-value.

What cell-count rule determines whether chi-square is appropriate or whether Fisher’s exact test is needed?

Chi-square is not suitable if any cell has fewer than five cases. When that condition occurs, Fisher’s exact test is recommended instead. The transcript describes running SPSS Crosstabs with the Exact option to obtain Fisher’s exact test results.

What does it mean to report “no significant association” in this context?

It means the p-value exceeds the chosen significance threshold (here, 0.05). In the example, the chi-square output shows a p-value greater than 0.05, and Fisher’s exact test also yields a p-value above 0.05. The conclusion is that H1 (significant association) is not supported, so independence is not rejected.

Review Questions

In a contingency table with r rows and c columns, what is the formula for degrees of freedom used in the chi-square test of independence?
How do you compute an expected cell frequency from row and column totals, and how does that expected value relate to the observed count?
What decision rule changes when any cell count falls below 5, and which SPSS test should be used instead?

Key Points

1
Chi-square test of independence is designed for nominal or ordinal categorical variables using contingency tables and frequency-based comparisons.
2
Observed cell frequencies come directly from the collected data; expected cell frequencies are computed as (row total × column total) / grand total.
3
A small chi-square statistic supports the null hypothesis of independence; a large chi-square statistic suggests association between the variables.
4
Degrees of freedom are calculated as (rows − 1) × (columns − 1), and SPSS provides chi-square, degrees of freedom, and p-values for significance testing.
5
Chi-square assumptions require adequate cell counts; if any cell has fewer than five cases, Fisher’s exact test should be used.
6
SPSS reporting should include the chi-square statistic, degrees of freedom, and p-value (or Fisher’s exact p-value), followed by a clear conclusion about whether H1 is supported.

Highlights

Chi-square test of independence measures how much observed counts differ from expected counts under independence, using expected frequencies computed from row and column totals.

For the red–introvert cell in the example, the expected count is 70 × 22 / 150 ≈ 10.3, compared with an observed count of 13—illustrating the observed-vs-expected foundation of the test.

When any contingency-table cell has fewer than five cases, Fisher’s exact test replaces chi-square to avoid unreliable results.

In the SPSS example, both chi-square and Fisher’s exact tests return p-values above 0.05, so personality and color preference are treated as not significantly associated at the 5% level.

Topics

Chi-Square Independence
Contingency Tables
Expected Frequencies
SPSS Crosstabs
Fisher Exact Test

Mentioned

SPSS
SPSS