Statistics for #Research - L2 - The Concept of Descriptive Statistics

TL;DR

Central tendency summarizes what’s typical in a dataset, while dispersion summarizes how spread out the values are.

Briefing Cornell Notes

Briefing

Descriptive statistics boil down to two jobs: summarizing what counts as “typical” in a dataset and quantifying how widely the values spread. Central tendency answers the typicality question—what a person’s height is likely to look like in a group—while dispersion (variability) answers whether those values cluster tightly or scatter across the range. Together, they turn a list of measurements into interpretable research information.

Central tendency is introduced through three measures. The mean is the arithmetic average: add all height measurements and divide by the number of observations. It’s presented as the go-to measure for interval and ratio scale variables. When values are ordered but not evenly spaced—like categories with a rank structure—the median becomes the appropriate “middle” value once the data are sorted from lowest to highest; it’s tied to ordinal scale variables. For nominal scale variables, where categories have no inherent order, the mode is used—the value that appears most frequently. An example with five people (male coded as 1 and female coded as 2) shows how the most repeated code becomes the mode.

Dispersion then shifts from “typical” to “spread.” The range is defined as the difference between the minimum and maximum values, giving a quick sense of how far apart the extremes are. But the transcript emphasizes that dispersion is broader than just extremes: variability describes how individual responses are distributed across the entire range. For a more reliable picture of spread, standard deviation is highlighted as the quickest way to gauge how much observations generally differ from the mean. Low standard deviation means most values sit close to the mean, while high standard deviation signals greater scattering.

Finally, the accuracy of the mean is linked to standard error. Standard error is described as standard deviation divided by the square root of the total number of responses. This connects sample size to precision: as the number of observations grows, the denominator increases, standard error falls, and the estimated mean becomes more accurate relative to the true population mean. The session frames these descriptive statistics—mean/median/mode for central tendency and range/standard deviation/standard error for dispersion—as the core summary information typically reported in research theses and papers.

Cornell Notes

Descriptive statistics focus on two essentials: central tendency (what’s typical) and dispersion (how spread out values are). Central tendency is measured with the mean for interval/ratio data, the median for ordinal data, and the mode for nominal data. Dispersion is assessed using range for a quick min–max spread, while standard deviation measures how far observations generally vary from the mean. Standard error refines this by estimating how accurately a sample mean reflects the true population mean, calculated as standard deviation divided by the square root of the number of responses. These tools help researchers summarize datasets in a way that supports interpretation and reporting.

How do mean, median, and mode differ, and when should each be used?

Mean is the arithmetic average: sum all values and divide by the number of observations. It’s used for interval and ratio scale variables. Median is the middle value after sorting data from lowest to highest; it’s used for ordinal scale variables. Mode is the most frequently occurring value; it’s used for nominal scale variables where categories have no natural order. An example given codes male as 1 and female as 2; the value that appears most often (1 in that example) is the mode.

What does dispersion measure, and how is it related to variability?

Dispersion (variability) describes how individual responses are distributed across the dataset’s range—whether values cluster near each other or spread widely apart. The range provides a simple min-to-max measure, but variability is broader, capturing the overall scattering of observations across the entire set of values.

Why is standard deviation more informative than range for understanding spread?

Range only compares the extremes (minimum vs. maximum). Standard deviation instead measures how much observations generally differ from the mean. Low standard deviation indicates most values are close to the mean, while high standard deviation indicates values are more widely spread around the mean.

What is standard error, and how does sample size affect it?

Standard error is calculated as standard deviation divided by the square root of the total number of responses. Because the denominator grows as sample size increases, standard error decreases with larger samples. Lower standard error means the sample mean is likely to be more accurate when compared with the true population mean.

How do central tendency and dispersion work together in research reporting?

Central tendency summarizes the “typical” value (mean/median/mode depending on measurement scale), while dispersion quantifies reliability and spread (range/standard deviation/standard error). Reporting both helps readers understand not only what the dataset’s typical value is, but also whether that typical value is based on tightly clustered data or widely scattered observations.

Review Questions

If a dataset is ordinal, which central tendency measure is appropriate and why?
A sample has a high standard deviation but a large sample size; how would you expect standard error to behave?
How would you interpret a small range versus a small standard deviation in terms of data spread?

Key Points

1
Central tendency summarizes what’s typical in a dataset, while dispersion summarizes how spread out the values are.
2
Mean is the arithmetic average and is used for interval and ratio scale variables.
3
Median is the middle value after sorting and is used for ordinal scale variables.
4
Mode is the most frequent value and is used for nominal scale variables.
5
Range measures spread using the difference between the minimum and maximum values.
6
Standard deviation measures how much observations generally vary from the mean; lower values indicate tighter clustering.
7
Standard error equals standard deviation divided by the square root of the number of responses, linking sample size to the accuracy of the sample mean.

Highlights

Central tendency depends on measurement scale: mean (interval/ratio), median (ordinal), mode (nominal).

Standard deviation distinguishes tight clustering from wide scattering by measuring typical variation around the mean.

Standard error quantifies how accurately a sample mean estimates the true population mean and shrinks as sample size grows.

Topics

Descriptive Statistics
Central Tendency
Dispersion
Mean Median Mode
Standard Deviation