LESSON 47 - DESCRIPTIVE STATISTICS: THE THREE METHODS OF ANALYSING DATA DESCRIPTIVELY

TL;DR

Descriptive statistics summarize sample data using tables, graphs, and single-number measures, so they should not be treated as population-wide conclusions.

Briefing Cornell Notes

Briefing

Descriptive statistics turn messy field data into manageable summaries—using a small set of numbers, tables, and graphs—so research findings can be presented and interpreted for an audience. Because descriptive statistics summarize a sample rather than the whole population, they limit generalization: results should be treated as describing the group actually observed, not claiming population-wide conclusions.

In social science research, data analysis often appears in Chapter 4, and it can be done manually or with software such as SPSS. The lesson frames data analysis as the process of reducing collected data to meaningful summaries. It also distinguishes descriptive statistics from inferential statistics: descriptive work focuses on summarizing what the sample looks like, while inferential methods are reserved for drawing broader conclusions.

Three methods organize descriptive statistics. First are tabular summaries, which rely on tables to present frequencies and relationships. Two common table types are frequency distribution tables and cross tabulation tables (crosstabs). Frequency tables show how often categories occur, while crosstabs compare categories across variables. The lesson also notes formatting conventions: tables are numbered by chapter (e.g., Table 4.1 for Chapter 4), and the table title appears at the top.

Second are graphical representations, using graphs selected according to the type of data and its scale of measurement. For categorical data, the lesson highlights bar graphs and pie charts; for continuous data, it points to histograms, frequency polygons, and scatter diagrams. A key practical link is that graph choice is not arbitrary: the data type determines the appropriate figure. The lesson further notes that histograms (drawn from continuous data) can provide distribution information such as the mean and standard deviation.

Third are numerical representations, which reduce many observations to single summary values. These include measures of central tendency and measures of variability. Central tendency uses the mean (for continuous data), the median (for continuous data), and the mode (for categorical data). The mean represents the typical value, the mode is the most frequent category/value, and the median is the middle value after ordering data. Because central tendency measures can be distorted by extreme values, the lesson stresses pairing them with variability.

Variability describes how spread out the data are around the mean. The lesson names range-related concepts and focuses on standard deviation and variance: variance is the sum of squared deviations, while standard deviation is the positive square root of variance. Interpretation is tied to distribution shape: a high standard deviation signals inconsistency and wide spread, while a low standard deviation indicates clustering near the mean. The lesson also references distribution shape concepts such as kurtosis and skewness (noting they describe how the distribution looks).

The takeaway is straightforward: descriptive statistics describe the sample and use three complementary approaches—tabular, graphical, and numerical—to summarize data accurately. The next step, deferred to a later lesson, is inferential statistics, which moves from description toward generalization.

Cornell Notes

Descriptive statistics summarize sample data so it can be presented and interpreted clearly, but they do not justify population-wide generalizations. The lesson organizes descriptive work into three methods: tabular summaries, graphical representations, and numerical summaries. Tables typically include frequency distributions and crosstabs for categorical data. Graphs are chosen based on data type and scale—bar charts and pie charts for categorical variables, histograms and scatter diagrams for continuous variables. Numerical summaries use measures of central tendency (mean, median, mode) alongside measures of variability (variance and standard deviation) to show both “typical values” and how widely observations are spread.

Why does descriptive statistics limit generalization beyond the sample?

Descriptive statistics summarize the particular group of individuals observed. Because the summaries (tables, graphs, and single-number statistics) are computed from the sample, they describe that sample’s characteristics rather than proving what the entire population looks like. That’s why findings based only on descriptive statistics should be treated as applying to the observed group, not automatically to a wider population.

What are the two main types of tabular summaries used in social science descriptive analysis?

The lesson highlights frequency distribution tables and cross tabulation tables (crosstabs). Frequency tables show how often categories occur, often using counts and percentages. Crosstabs compare two categorical variables by showing the frequency distribution across combinations of categories.

How should a researcher choose between graphs like bar charts, pie charts, histograms, and scatter diagrams?

Graph choice depends on data type and scale of measurement. Categorical data are matched with bar graphs and pie charts. Continuous data are matched with histograms, frequency polygons, and scatter diagrams. The lesson also notes that even within graphical methods, the scale of measurement determines which figure is appropriate.

What do mean, median, and mode represent, and which data types do they fit?

Mean is used for continuous data and represents the typical value. Median is also for continuous data and is the middle value after ordering observations from lowest to highest. Mode is for categorical data and identifies the value/category with the highest frequency.

Why pair measures of central tendency with measures of variability?

Central tendency alone can be misleading because extreme values can pull the mean or otherwise distort the “typical” value. Measures of variability show how spread out the data are around the mean, helping interpret whether observations cluster tightly or scatter widely. The lesson emphasizes combining central tendency with standard deviation (and variance) to avoid overinterpreting a single summary.

How are variance and standard deviation related, and how do they affect interpretation?

Variance is defined as the sum of squared deviations. Standard deviation is the positive square root of variance. Interpretation follows: a high standard deviation indicates values are widely spread from the mean (greater inconsistency), while a low standard deviation indicates most values cluster near the mean. The lesson also connects standard deviation to how the normal distribution curve becomes flatter as spread increases.

Review Questions

What are the three methods of analyzing data descriptively, and what does each method primarily summarize?
For a dataset of categorical variables, which tabular and graphical tools are most appropriate, and why?
How do variance and standard deviation help interpret the meaning of the mean or median?

Key Points

1
Descriptive statistics summarize sample data using tables, graphs, and single-number measures, so they should not be treated as population-wide conclusions.
2
Data analysis often appears in Chapter 4 of social science projects and can be done manually or with software such as SPSS.
3
Tabular descriptive statistics commonly use frequency distribution tables and cross tabulation tables (crosstabs) for categorical data.
4
Graphical descriptive statistics require matching the graph type to the data type: categorical variables use bar charts/pie charts, while continuous variables use histograms/frequency polygons/scatter diagrams.
5
Numerical descriptive statistics include measures of central tendency (mean, median, mode) and measures of variability (variance and standard deviation).
6
Central tendency measures can be distorted by extreme values, so variability measures—especially standard deviation—are needed for proper interpretation.
7
Standard deviation reflects spread around the mean: higher values indicate wider dispersion, while lower values indicate clustering near the mean.

Highlights

Descriptive statistics describe the sample and therefore restrict generalization beyond the observed group.

Graph selection is driven by data type: categorical variables align with bar/pie charts, while continuous variables align with histograms and scatter diagrams.

Standard deviation is the positive square root of variance and is used to judge how tightly data cluster around the mean.

Mean, median, and mode serve different roles and map to different data types: mean/median for continuous data and mode for categorical data.

Pairing central tendency with variability helps prevent misleading conclusions caused by extreme values.

Topics

Descriptive Statistics
Data Analysis
Tabular Summaries
Graphical Summaries
Measures of Central Tendency
Measures of Variability

Mentioned

SPSS