The Problem With IQ Tests

TL;DR

IQ tests are designed to estimate general intelligence (“g”) by combining multiple mental tasks and normalizing scores to a population mean of 100 and SD of 15.

Briefing Cornell Notes

Briefing

IQ tests are widely treated as a clean, objective measure of “intelligence,” but the underlying science is messier: IQ is strongly linked to real-world outcomes, yet it is also shaped by culture, motivation, test-taking strategy, and historical test design choices. That combination helps explain why IQ scores can predict school performance, job success, and even longevity—while also fueling misuse, controversy, and claims that don’t hold up.

The modern IQ concept traces to early 1900s work on correlations between school subjects. In 1904, psychologist Charles Spearman found that students who did well in one subject tended to do well in others, with a correlation of 0.64 between math and English. He proposed a general intelligence factor, “g,” plus smaller subject-specific influences. Around the same time, Alfred Binet and Theodore Simon built the Binet-Simon test to identify children who needed extra help. Their approach used “mental age” relative to actual age, producing the original idea of an intelligence quotient. In the U.S., Lewis Terman standardized and modified the test into the Stanford-Binet, and later IQ batteries expanded to multiple abilities—memory, verbal, spatial, and numerical—then normalized scores so the population mean sits at 100 with a standard deviation of 15.

Those scores do relate to life outcomes. Higher IQ correlates with larger brain size (a 2005 meta-analysis reported 0.33), and with school achievement. A major study of 13,000 Scottish children measured IQ at age 11 and found correlations around 0.8 with later GCSE marks—suggesting a large share of variation in exam performance can be predicted from earlier IQ. IQ also tracks educational attainment and is often comparable to standardized tests like the SAT, ACT, and GRE (correlations around 0.8). Outside school, IQ shows moderate links to occupational success (often 0.2 to 0.6), with the strongest effects in complex roles; the U.S. military historically used IQ thresholds and found that lowering them increased failure rates and remedial training needs.

Yet IQ is not a pure readout of fixed ability. The transcript highlights the “Flynn Effect,” where average IQ scores rise over decades even though genetics don’t change quickly. James Flynn’s work suggests re-normalizations would otherwise reveal a roughly 30-point increase over the last century, likely driven by improved nutrition and health, better education, and shifts toward more abstract work. Even within IQ testing, performance can move with incentives: paying test-takers can raise scores, sometimes by up to 20 points, and coaching can boost results by several points. Time pressure, anxiety, and motivation all matter.

The controversy is also historical and political. The transcript links IQ’s U.S. adoption to eugenics: Henry Goddard’s interpretation of inherited, unchangeable intelligence helped justify forced sterilization laws, upheld by the Supreme Court in 1927, with tens of thousands sterilized and later influence claimed by Nazi Germany. Modern researchers emphasize that IQ is partly heritable but also environment-dependent, with estimates often around a 50/50 split in twin studies.

In the end, IQ is best treated as a useful predictor—not a verdict on worth. It can help identify strengths and support decisions (including in education and clinical settings), but it also measures more than “g,” and it can be distorted by culture and incentives. The transcript’s central message is that IQ scores can be informative while still being incomplete—and that how society uses them matters as much as what they measure.

Cornell Notes

IQ tests correlate with meaningful outcomes—school achievement, job performance, and even longevity—but they do not function as a perfectly objective, fixed measure of intelligence. The concept of IQ grew from Spearman’s “g” factor and Binet’s mental-age quotient approach, later standardized into modern scoring (mean 100, SD 15). Research highlights the Flynn Effect (average IQ rising over decades) and shows that motivation, coaching, and test-taking strategy can shift scores, meaning IQ reflects more than raw ability. Given IQ’s predictive power alongside its susceptibility to cultural and situational influences—and its misuse in eugenics—scores should be treated as a probabilistic tool, not a measure of human worth.

How did IQ testing emerge from early research on correlations between school subjects?

Spearman analyzed students’ grades across subjects and found positive correlations: students who did well in math tended to do well in English, with a correlation coefficient of 0.64. He proposed a general intelligence factor (“g”) that helps explain why performance across different subjects moves together, plus subject-specific factors (“s-factors”) that can raise or lower performance in particular areas. This framework supported the idea that IQ batteries could estimate g by averaging across multiple mental abilities while subject-specific noise partially cancels out.

What does an IQ score actually represent in modern testing?

Modern IQ tests normalize raw scores against a large population, typically setting the mean at 100 and the standard deviation at 15. The transcript notes that about 68% of people fall between 85 and 115, and roughly 2% score above 130 or below 70. The intent is to compare an individual’s estimated g relative to others, using multiple sections (often 7–10) to reduce subject-specific distortions.

Why do IQ scores predict outcomes like school performance and job success?

The transcript cites strong predictive relationships. In Scotland, IQ measured at age 11 correlated with later GCSE exam marks about 0.8, implying a large portion of variation in exam performance can be predicted from earlier IQ. IQ also correlates with occupational success, often in the 0.2 to 0.6 range, with stronger effects for complex jobs. It even links to longevity in long-term follow-ups, with a reported 27% higher likelihood of being alive at age 76 per 15-point increase in IQ.

What evidence suggests IQ is not fixed and not purely genetic?

Two major points appear. First, the Flynn Effect: average IQ scores rise over time, and Flynn’s analysis suggests that without re-normalization, the population would show an increase of about 30 points over roughly a century. Likely drivers include improved nutrition and health, better education, and shifts toward more abstract work. Second, incentives and coaching can change scores: paying participants to complete IQ tests can raise IQ (up to around 20 points in some findings), and training/coaching can boost scores by up to about eight points. These effects indicate that motivation and practice influence results.

How did IQ testing become entangled with eugenics and why does that still shape public attitudes?

The transcript ties IQ’s U.S. adoption to Henry Goddard’s interpretation of intelligence as inherited and unchangeable. That framing fed the American eugenics movement, where forced sterilization laws were passed for people failing IQ thresholds. The Supreme Court upheld constitutionality in 1927, and the transcript notes that over 60,000 people were forcibly sterilized. It also claims American eugenics served as a model for Nazi Germany, with Hitler citing inspiration from American eugenicists—fueling ongoing skepticism toward IQ testing.

What does “culture fair” mean in IQ testing, and why is it difficult to achieve?

Some tests market themselves as culture fair, but the transcript argues a truly culture-free test is impossible. Even when items focus on visual relations, geometric shapes, and patterns, cultural differences affect how people categorize and reason about those stimuli. Additionally, culture fair tests often omit forms of intelligence tied to local knowledge or survival skills (e.g., ethnobotanical knowledge or hunting/training dogs), which may matter more for survival than the puzzle-like tasks IQ tests emphasize.

Review Questions

What are Spearman’s “g” and “s-factors,” and how do they justify the structure of IQ batteries?
How does the Flynn Effect challenge the idea that IQ is fixed, and what explanations are offered for the rise in average scores?
List at least three non-ability factors mentioned that can change IQ test performance (e.g., motivation, coaching, time pressure) and describe how each affects scores.

Key Points

1
IQ tests are designed to estimate general intelligence (“g”) by combining multiple mental tasks and normalizing scores to a population mean of 100 and SD of 15.
2
IQ correlates with real-world outcomes such as school achievement, job performance, and longevity, with some studies reporting very strong predictive relationships for education.
3
Average IQ scores have risen over time in ways captured by the Flynn Effect, suggesting environment and culture can shift test results even if genetics change slowly.
4
Motivation, incentives, coaching, and test-taking strategy can measurably raise IQ scores, meaning performance is not purely a fixed trait.
5
Historical misuse—especially eugenics-era interpretations and forced sterilization laws—has contributed to lasting public distrust of IQ testing.
6
IQ testing is not fully culture-free; cultural differences influence how people interpret categories and what kinds of knowledge matter for success.
7
The most defensible use of IQ is as a probabilistic tool for identifying strengths and risks, not as a measure of personal worth or destiny.

Highlights

Spearman’s early work found that students’ performance across subjects moves together, motivating the idea of a general intelligence factor (“g”).

Longitudinal data from age 11 to later exams reported correlations around 0.8 between IQ and GCSE performance, indicating strong predictive power for schooling.

The Flynn Effect suggests average IQ scores rise over decades, undermining the idea that IQ is purely fixed or purely genetic.

Incentives and coaching can shift scores substantially—motivation can raise IQ, and training can boost performance by several points.

The transcript links IQ’s U.S. history to eugenics and forced sterilization laws upheld by the Supreme Court in 1927, shaping modern skepticism.

Topics

IQ Origins
Spearman’s g
Binet-Simon Test
Flynn Effect
Motivation and Coaching

Mentioned

Charles Spearman
Alfred Binet
Theodore Simon
Lewis Terman
James Flynn
Henry Goddard
Ian Deary
Oliver Wendell Holmes
Stephen Hawking
Hitler
IQ
g
s-factors
GCSE
SAT
ACT
GRE
SD