Statistics for Research - L4 - Introduction to SPSS and Data Entry

TL;DR

Define all variables in “Variable View” before entering any responses in “Data View.”

Briefing Cornell Notes

Briefing

SPSS data analysis starts with a disciplined setup: define every variable first, then enter responses using numeric coding, and finally verify that the dataset matches the questionnaire. In social science research, anything measured—age, gender, education, university, job details, and survey items—counts as a variable, and SPSS requires those variables to be specified in “Variable View” before any data go into “Data View.” The workflow matters because SPSS treats analysis as number-based, so categories and Likert-scale answers must be converted into consistent numeric representations.

A typical study setup begins with demographic variables (e.g., age, gender, education, program, university, job rank, tenure) and then moves to constructs measured indirectly through indicators. For instance, an unobserved “latent” construct (like internal stakeholder responsibilities) is operationalized through multiple observed items (six indicators in the example). Each indicator is measured on a five-point Likert scale where 1 corresponds to “strongly disagree” and 5 to “strongly agree.” Since respondents select only one option per item, the dataset should contain single numeric values per indicator per respondent—no multi-select entries.

In “Variable View,” the type and measurement level for each variable must be set. Age is treated as interval/ratio (numeric with meaningful ordering), gender as nominal (no inherent order), education as ordinal (ordered categories such as degrees), and university as nominal. For non-numeric categories, SPSS uses value labels. The example shows mapping gender categories to numbers (e.g., 1 = male, 2 = female) and similarly encoding education levels (e.g., 1 = Masters, 2 = MPhil/MS, 3 = PhD, etc.) and universities as numeric codes. Likert items can reuse the same numeric scale (1–5) with labels for each response option.

Once variables are defined, the next step is entering data in “Data View.” Each row represents a respondent, and each column corresponds to a variable or indicator. When questionnaires exist in print, the dataset should include a questionnaire number so each row can be traced back to the correct paper. The example demonstrates entering respondent #1 with values for demographics and six responsibility indicators, then repeating for respondent #2 and onward.

After data entry, SPSS needs a quick correctness check. A practical first validation uses “Analyze” → “Descriptive Statistics” → “Frequencies,” including minimum and maximum checks. If a value falls outside the expected range—such as a Likert item showing a 6 when the scale runs from 1 to 5—SPSS highlights the offending variable. The fix is to locate the exact cell (using “Edit” → “Find” to search for the invalid value), then cross-check the corresponding questionnaire row to correct the entry. This cycle—define, code, enter, and validate—sets up the dataset for later analysis steps.

Cornell Notes

SPSS workflow for research data begins by defining variables in “Variable View” and only then entering responses in “Data View.” Every measured concept becomes a variable, including demographics and survey indicators used to measure latent constructs. Because SPSS analysis is number-based, categories like gender, education, and university must be coded with numeric values and value labels (e.g., 1=male, 2=female; ordered education levels as 1, 2, 3). Likert-scale items are coded consistently (1=strongly disagree through 5=strongly agree). After entry, minimum/maximum checks via frequencies help catch mistakes (like an invalid 6 on a 1–5 scale), which can be traced back to the specific questionnaire row for correction.

Why must variables be defined before entering data in SPSS?

SPSS requires variable definitions in “Variable View” first, because analysis depends on knowing each variable’s type and measurement level. Demographics and survey items must be set up with correct numeric coding and labels (e.g., gender as nominal, education as ordinal). Once variables exist, “Data View” can store each respondent’s values in the right columns.

How should gender, education, and university be handled if they aren’t inherently numeric?

They should be converted into numeric codes with value labels. For example, gender can be coded as 1=male and 2=female. Education can be coded as ordered categories (e.g., 1=Masters, 2=MPhil/MS, 3=PhD). Universities can also be assigned numeric identifiers. This keeps the dataset analyzable while preserving the meaning of each category through labels.

What’s the difference between a latent construct and its indicators in SPSS data entry?

A latent construct is unobserved, so there’s no direct data to enter for it. Instead, the construct is measured through observed indicators (e.g., six Likert items). Data entry should include the indicator variables (ISR1–ISR6 in the example), not the latent variable itself.

How are Likert-scale survey items coded, and what constraint should the dataset follow?

Each Likert item is coded numerically on a fixed scale, such as 1=strongly disagree through 5=strongly agree. Since respondents choose one option per item, each indicator cell for a respondent should contain exactly one value from the allowed range—no extra selections and no out-of-range numbers.

What’s an effective way to detect data-entry errors after entering responses?

Run “Analyze” → “Descriptive Statistics” → “Frequencies,” and check minimum and maximum values for variables. If a value falls outside the expected range (e.g., a 6 appears on a 1–5 Likert item), use “Edit” → “Find” to locate the invalid cell, then cross-check the corresponding questionnaire row and correct the entry.

Review Questions

What variable properties (type/measurement level) should be set for age, gender, education, and university, and why do they differ?
How would you code a five-point Likert item and ensure respondents’ choices remain within the valid range during data entry?
After running frequencies, what steps would you take to trace and fix an out-of-range value back to the original questionnaire?

Key Points

1
Define all variables in “Variable View” before entering any responses in “Data View.”
2
Convert categorical demographics (gender, education, university) into numeric codes and attach value labels for interpretability.
3
Treat latent constructs as unobserved: enter only their measured indicators (e.g., multiple Likert items) rather than the latent variable itself.
4
Use consistent numeric coding for Likert items (e.g., 1–5) and ensure each respondent has exactly one value per item.
5
Set appropriate measurement levels: age as interval/ratio, gender as nominal, education as ordinal, and university as nominal.
6
Assign questionnaire numbers to rows so data-entry mistakes can be traced back to the correct paper questionnaire.
7
Validate data entry using minimum/maximum checks (frequencies) and correct any out-of-range values by locating the offending cell and re-checking the source questionnaire.

Highlights

SPSS requires variable definitions first: “Variable View” sets types, labels, and measurement levels before any data can be entered.

Latent constructs aren’t directly entered; their indicator items (e.g., six Likert questions) are what get stored and analyzed.

Numeric coding with value labels turns categories like gender and education into analysis-ready variables.

A simple frequencies check with min/max can quickly reveal entry errors such as a Likert value of 6 when only 1–5 are valid.

Tracing errors works best when each questionnaire is numbered to match the dataset row.

Topics

SPSS Setup
Variable View
Data Entry
Value Labels
Likert Coding
Data Validation