Statistics for #Research - L1 - Introduction to Statistical Variables

TL;DR

Statistical data is numerical information organized by cases (records) tied to entities such as people, organizations, regions, or time periods.

Briefing Cornell Notes

Briefing

The session lays the groundwork for research statistics by defining what “statistical data” and “statistical variables” actually mean, then sorting variables into practical types researchers use when building analyses. The core takeaway is that statistical data is numerical information organized by cases (records) and that statistical variables are the measurable properties attached to those cases—properties that can vary across individuals, times, or contexts. Without that clarity, later choices about analysis methods (and how to code questionnaire responses) become guesswork.

Statistical data is described as a collection of numbers located within cases or records tied to entities such as people. A case might be a respondent’s answer, a country-year observation like GDP per year, or a record for a specific unit such as a hospital or a region. The same dataset can include cases at multiple levels—for example, one dataset for individual patients, another for medical emergency teams, and another for hospitals—allowing comparisons across groups or even across cities.

A statistical variable is then defined as a special kind of mathematical variable that represents a conceptual space within a broader set of concepts. That conceptual space can be abstract (like personality traits such as introversion/extroversion) or physical (like height or weight). Two properties distinguish a variable: (1) it holds a measurement value for each individual case, and (2) across cases, it can take one or more values. If a measurement never changes—staying fixed for a respondent, event, or unit—it behaves like a constant rather than a variable.

The session also connects variables to how researchers organize observations about real-world concepts. Variables can capture straightforward demographics such as gender, but they can also represent complex constructs like attitudes, job commitment, turnover intention, or perceptions of an organization’s social responsibility.

Finally, the session categorizes variables into types that drive analysis choices. Categorical variables assign each possible value to a distinct category—examples include gender, job rank, education level, driver’s license status (provisional/open/no license), and city of residence (e.g., Peshawar, London, Lahore, Manchester, New York). Continuous variables, by contrast, fall along a spectrum—age, height, and weight are given as examples.

A key nuance is that some responses are technically categorical but behave like continuous measures in practice. Likert-style “liquid scale” responses (the transcript uses this phrasing) such as “How much do you enjoy statistics?” range from “greatly enjoy” to “greatly dislike.” Even though these are categories, researchers often treat them as metric/numerical responses by mathematically manipulating them as continuous variables, which enables common statistical techniques for explaining variance in dependent variables and assessing how predictors influence outcomes. The session positions this variable-type decision as essential preparation for the later R and SPSS implementation that will follow in the series.

Cornell Notes

The session defines statistical data as numerical information organized by cases (records) tied to entities like people, organizations, regions, or time periods. A statistical variable is a measurable property attached to each case that can vary across cases; if it never changes, it functions as a constant. Variables can represent abstract constructs (e.g., introversion/extroversion, attitudes) or physical measurements (e.g., height, weight). The transcript distinguishes categorical variables (distinct categories like gender or city) from continuous variables (values along a spectrum like age or weight). It also notes that Likert-style responses are categorical in form but are commonly treated as continuous/metric variables to support standard statistical techniques.

What makes something “statistical data,” and how do cases fit into that definition?

Statistical data is defined as a collection of numerical information. Those numbers are located within cases or records tied to separate entities. A case can be a respondent’s answer, a time-based observation such as GDP per year, or a unit like a hospital. Cases can also exist at multiple levels in one dataset—individual people, groups such as medical emergency teams, larger entities like organizations, or regions—so researchers can compare across groups or even across cities.

How is a statistical variable different from a constant?

A statistical variable is described as a mathematical variable representing a conceptual space, with two key properties: it holds a measurement value for each individual case, and across cases it can take one or more values. If a measurement is limited to one value and does not change for a respondent or event, it behaves like a constant rather than a variable. The variable label exists because values change from person to person, time to time, or place to place.

What are categorical variables, and what are the examples given?

Categorical variables are those where each possible value corresponds to a distinct category. Examples include gender, job rank, and education level. The transcript also lists driver’s license status as categorical (provisional license, open license, or no license) and city of residence as categorical (e.g., Peshawar, London, Lahore, Manchester, New York).

What counts as a continuous variable in this framework?

Continuous variables are described as values that fall along a spectrum. Examples include age and height, and the transcript also includes weight as continuous. The defining idea is that the measurement can vary smoothly across a range rather than jumping between distinct categories.

Why are Likert-style “liquid scale” responses treated like continuous variables in practice?

The transcript notes that Likert-style responses are categorical in that answers fall into ordered categories (e.g., “greatly enjoy” to “greatly dislike”). However, researchers often treat these responses as continuous/metric because they can be mathematically manipulated as numerical values. That practical choice supports statistical techniques for explaining variance in a dependent variable and assessing how predictors affect outcomes.

How do variable types connect to later statistical analysis choices?

Variable type determines how measurements are coded and which statistical methods are appropriate. Categorical variables use category-based approaches, while continuous variables support analyses that assume numeric spectra. For Likert-style responses, treating them as continuous enables common modeling techniques that quantify relationships between predictors and outcomes, including explaining how predictors influence dependent variables.

Review Questions

How would you decide whether a measurement should be treated as categorical or continuous based on the definitions in the session?
Give two examples of variables from the transcript and classify each as categorical or continuous, explaining why.
What is the practical reason researchers treat Likert-style responses as metric/continuous, and what does that enable statistically?

Key Points

1
Statistical data is numerical information organized by cases (records) tied to entities such as people, organizations, regions, or time periods.
2
A statistical variable is a measurable property that varies across cases; if it never changes, it functions as a constant.
3
Variables can represent both abstract constructs (e.g., personality traits, attitudes) and physical measurements (e.g., height, weight).
4
Categorical variables take distinct categories (e.g., gender, education level, city of residence, driver’s license status).
5
Continuous variables take values along a spectrum (e.g., age, height, weight).
6
Likert-style “liquid scale” responses are categorical in form but are commonly treated as continuous/metric by mathematical manipulation to support standard statistical techniques.

Highlights

Statistical data is defined as numbers organized by cases, which can exist at multiple levels such as individuals, teams, hospitals, and regions.

A statistical variable must vary across cases; fixed measurements belong to the category of constants.

Categorical variables include distinct statuses like driver’s license type and city of residence, while continuous variables include age, height, and weight.

Likert-style responses are categorical but are often treated as continuous/metric to enable common modeling approaches.

The variable-type decision is presented as a prerequisite for choosing appropriate statistical analysis methods later in the series.

Topics

Statistical Data
Statistical Variables
Categorical Variables
Continuous Variables
Likert Responses