LESSON 47 - DESCRIPTIVE STATISTICS: THE THREE METHODS OF ANALYSING DATA DESCRIPTIVELY
Based on RESEARCH METHODS CLASS WITH PROF. LYDIAH WAMBUGU's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Descriptive statistics summarize sample data using tables, graphs, and single-number measures, so they should not be treated as population-wide conclusions.
Briefing
Descriptive statistics turn messy field data into manageable summaries—using a small set of numbers, tables, and graphs—so research findings can be presented and interpreted for an audience. Because descriptive statistics summarize a sample rather than the whole population, they limit generalization: results should be treated as describing the group actually observed, not claiming population-wide conclusions.
In social science research, data analysis often appears in Chapter 4, and it can be done manually or with software such as SPSS. The lesson frames data analysis as the process of reducing collected data to meaningful summaries. It also distinguishes descriptive statistics from inferential statistics: descriptive work focuses on summarizing what the sample looks like, while inferential methods are reserved for drawing broader conclusions.
Three methods organize descriptive statistics. First are tabular summaries, which rely on tables to present frequencies and relationships. Two common table types are frequency distribution tables and cross tabulation tables (crosstabs). Frequency tables show how often categories occur, while crosstabs compare categories across variables. The lesson also notes formatting conventions: tables are numbered by chapter (e.g., Table 4.1 for Chapter 4), and the table title appears at the top.
Second are graphical representations, using graphs selected according to the type of data and its scale of measurement. For categorical data, the lesson highlights bar graphs and pie charts; for continuous data, it points to histograms, frequency polygons, and scatter diagrams. A key practical link is that graph choice is not arbitrary: the data type determines the appropriate figure. The lesson further notes that histograms (drawn from continuous data) can provide distribution information such as the mean and standard deviation.
Third are numerical representations, which reduce many observations to single summary values. These include measures of central tendency and measures of variability. Central tendency uses the mean (for continuous data), the median (for continuous data), and the mode (for categorical data). The mean represents the typical value, the mode is the most frequent category/value, and the median is the middle value after ordering data. Because central tendency measures can be distorted by extreme values, the lesson stresses pairing them with variability.
Variability describes how spread out the data are around the mean. The lesson names range-related concepts and focuses on standard deviation and variance: variance is the sum of squared deviations, while standard deviation is the positive square root of variance. Interpretation is tied to distribution shape: a high standard deviation signals inconsistency and wide spread, while a low standard deviation indicates clustering near the mean. The lesson also references distribution shape concepts such as kurtosis and skewness (noting they describe how the distribution looks).
The takeaway is straightforward: descriptive statistics describe the sample and use three complementary approaches—tabular, graphical, and numerical—to summarize data accurately. The next step, deferred to a later lesson, is inferential statistics, which moves from description toward generalization.
Cornell Notes
Descriptive statistics summarize sample data so it can be presented and interpreted clearly, but they do not justify population-wide generalizations. The lesson organizes descriptive work into three methods: tabular summaries, graphical representations, and numerical summaries. Tables typically include frequency distributions and crosstabs for categorical data. Graphs are chosen based on data type and scale—bar charts and pie charts for categorical variables, histograms and scatter diagrams for continuous variables. Numerical summaries use measures of central tendency (mean, median, mode) alongside measures of variability (variance and standard deviation) to show both “typical values” and how widely observations are spread.
Why does descriptive statistics limit generalization beyond the sample?
What are the two main types of tabular summaries used in social science descriptive analysis?
How should a researcher choose between graphs like bar charts, pie charts, histograms, and scatter diagrams?
What do mean, median, and mode represent, and which data types do they fit?
Why pair measures of central tendency with measures of variability?
How are variance and standard deviation related, and how do they affect interpretation?
Review Questions
- What are the three methods of analyzing data descriptively, and what does each method primarily summarize?
- For a dataset of categorical variables, which tabular and graphical tools are most appropriate, and why?
- How do variance and standard deviation help interpret the meaning of the mean or median?
Key Points
- 1
Descriptive statistics summarize sample data using tables, graphs, and single-number measures, so they should not be treated as population-wide conclusions.
- 2
Data analysis often appears in Chapter 4 of social science projects and can be done manually or with software such as SPSS.
- 3
Tabular descriptive statistics commonly use frequency distribution tables and cross tabulation tables (crosstabs) for categorical data.
- 4
Graphical descriptive statistics require matching the graph type to the data type: categorical variables use bar charts/pie charts, while continuous variables use histograms/frequency polygons/scatter diagrams.
- 5
Numerical descriptive statistics include measures of central tendency (mean, median, mode) and measures of variability (variance and standard deviation).
- 6
Central tendency measures can be distorted by extreme values, so variability measures—especially standard deviation—are needed for proper interpretation.
- 7
Standard deviation reflects spread around the mean: higher values indicate wider dispersion, while lower values indicate clustering near the mean.