Statistics for Research - L18 - #Correlation Analysis using #SPSS
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Correlation coefficient r measures the direction and strength of a linear relationship between two quantitative variables, with sign indicating direction (positive vs negative).
Briefing
Correlation analysis turns a scatter plot’s “looks related” impression into a numeric measure of how strongly two quantitative variables move together—and whether that relationship is statistically reliable. The key statistic is the correlation coefficient, r (often written as R in output). It captures both direction and strength of a linear relationship: a positive value means higher scores on one variable tend to align with higher scores on the other, while a negative value means increases in one tend to accompany decreases in the other. Because r is designed for linear patterns, it should be interpreted alongside a scatter plot to confirm that the relationship is actually linear.
Interpreting r comes with commonly used—but not universal—guidelines. Values near 0 indicate very weak or weak linear association, while values closer to 1 (or -1) indicate stronger linear association. The session lists thresholds such as: |r| ≤ 0.1 as very weak, 0.1 to 0.3 as weak, 0.3 to 0.5 as moderate, 0.5 to 0.7 as strong, and above 0.7 as very strong. It also flags a practical warning: correlations above about 0.85 can signal multicollinearity, meaning two constructs may be measuring nearly the same underlying concept.
Beyond strength, significance matters. In SPSS correlation output, the p-value is used to judge whether the observed association is likely to reflect a real relationship in the population rather than sampling noise. The workflow follows a typical social-science rule: if p < 0.05, the relationship is treated as statistically significant (rejecting the null hypothesis in favor of an alternate hypothesis). The session emphasizes that significance and strength are different questions—one tells whether the relationship is detectable, the other tells how strong it is.
An example is run in SPSS using two variables: “Vision” and “Organizational Performance.” After creating composite variables by averaging multiple items, the correlation output shows a positive correlation of about r = 0.622. The p-value is reported as less than 0.01, supporting statistical significance. The magnitude falls in the “strong” range based on the provided guidelines, and the scatter plot aligns with this interpretation: points cluster upward, indicating that as Vision increases, Organizational Performance tends to increase as well.
The session also highlights several properties of correlation coefficients. r does not change when the measurement units of variables change, because it standardizes the relationship. It also measures only linear association and ignores non-linear patterns, so checking the scatter plot and adding a fitted line (e.g., “fit line total”) helps verify linearity. When reporting results, the session recommends including the correlation coefficient, sample size (N), and p-value, and stating the direction (positive or negative) and strength category.
For more than two variables, the approach shifts to a correlation matrix. In SPSS, adding a third variable produces a table of pairwise correlations. The session advises using one-tailed tests when the direction of the relationship is justified by prior literature (positive or negative), and two-tailed tests when direction is uncertain. For write-ups, the matrix should be formatted for clarity (e.g., removing redundant cells and nonessential entries), and only statistically significant or substantively important correlations should be described—while noting when no other significant relationships appear.
Cornell Notes
Correlation coefficient r quantifies the direction and strength of a linear relationship between two quantitative variables. Positive r indicates that as one variable increases, the other tends to increase; negative r indicates the opposite. Common interpretation cutoffs classify r as very weak/weak/moderate/strong/very strong, but the session stresses these are guidelines and that context matters. Statistical significance is assessed using the p-value (typically p < 0.05, with the example using p < 0.01). In SPSS, scatter plots and fitted lines help confirm linearity, and correlation matrices extend the method to multiple variables, with one-tailed or two-tailed testing chosen based on prior expectations.
How does the sign of the correlation coefficient (r) change what the relationship means?
Why isn’t the correlation coefficient alone enough to confirm the relationship is linear?
What role does the p-value play in correlation results, and how is it interpreted?
What does it mean when r is very high (e.g., above 0.85)?
How should correlation results be reported in a two-variable case versus a multi-variable case?
When should one-tailed versus two-tailed testing be used in SPSS correlation matrices?
Review Questions
- What information does r provide that a scatter plot alone cannot, and what information does it miss that the scatter plot can reveal?
- Using the provided guidelines, how would you classify a correlation of r = -0.45, and what would that imply about direction and strength?
- In SPSS correlation output, what three values should be included when writing up a correlation between two variables, and why?
Key Points
- 1
Correlation coefficient r measures the direction and strength of a linear relationship between two quantitative variables, with sign indicating direction (positive vs negative).
- 2
Common strength guidelines classify |r| near 0 as very weak/weak, 0.3–0.5 as moderate, 0.5–0.7 as strong, and above 0.7 as very strong, but context can change interpretation.
- 3
Statistical significance is assessed with the p-value; p < 0.05 is treated as significant, and the example uses p < 0.01.
- 4
Correlation is unit-invariant and captures only linear association, so scatter plots (with fitted lines) are needed to verify linearity.
- 5
Correlations above about 0.85 may indicate multicollinearity, suggesting two constructs overlap heavily.
- 6
For two variables, report r, N, and p-value; for multiple variables, use a correlation matrix and describe only significant or important pairwise relationships.
- 7
Choose one-tailed tests when the direction is justified by prior research; use two-tailed tests when direction is uncertain.