Statistics for Research - L4 - Introduction to SPSS and Data Entry
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Define all variables in “Variable View” before entering any responses in “Data View.”
Briefing
SPSS data analysis starts with a disciplined setup: define every variable first, then enter responses using numeric coding, and finally verify that the dataset matches the questionnaire. In social science research, anything measured—age, gender, education, university, job details, and survey items—counts as a variable, and SPSS requires those variables to be specified in “Variable View” before any data go into “Data View.” The workflow matters because SPSS treats analysis as number-based, so categories and Likert-scale answers must be converted into consistent numeric representations.
A typical study setup begins with demographic variables (e.g., age, gender, education, program, university, job rank, tenure) and then moves to constructs measured indirectly through indicators. For instance, an unobserved “latent” construct (like internal stakeholder responsibilities) is operationalized through multiple observed items (six indicators in the example). Each indicator is measured on a five-point Likert scale where 1 corresponds to “strongly disagree” and 5 to “strongly agree.” Since respondents select only one option per item, the dataset should contain single numeric values per indicator per respondent—no multi-select entries.
In “Variable View,” the type and measurement level for each variable must be set. Age is treated as interval/ratio (numeric with meaningful ordering), gender as nominal (no inherent order), education as ordinal (ordered categories such as degrees), and university as nominal. For non-numeric categories, SPSS uses value labels. The example shows mapping gender categories to numbers (e.g., 1 = male, 2 = female) and similarly encoding education levels (e.g., 1 = Masters, 2 = MPhil/MS, 3 = PhD, etc.) and universities as numeric codes. Likert items can reuse the same numeric scale (1–5) with labels for each response option.
Once variables are defined, the next step is entering data in “Data View.” Each row represents a respondent, and each column corresponds to a variable or indicator. When questionnaires exist in print, the dataset should include a questionnaire number so each row can be traced back to the correct paper. The example demonstrates entering respondent #1 with values for demographics and six responsibility indicators, then repeating for respondent #2 and onward.
After data entry, SPSS needs a quick correctness check. A practical first validation uses “Analyze” → “Descriptive Statistics” → “Frequencies,” including minimum and maximum checks. If a value falls outside the expected range—such as a Likert item showing a 6 when the scale runs from 1 to 5—SPSS highlights the offending variable. The fix is to locate the exact cell (using “Edit” → “Find” to search for the invalid value), then cross-check the corresponding questionnaire row to correct the entry. This cycle—define, code, enter, and validate—sets up the dataset for later analysis steps.
Cornell Notes
SPSS workflow for research data begins by defining variables in “Variable View” and only then entering responses in “Data View.” Every measured concept becomes a variable, including demographics and survey indicators used to measure latent constructs. Because SPSS analysis is number-based, categories like gender, education, and university must be coded with numeric values and value labels (e.g., 1=male, 2=female; ordered education levels as 1, 2, 3). Likert-scale items are coded consistently (1=strongly disagree through 5=strongly agree). After entry, minimum/maximum checks via frequencies help catch mistakes (like an invalid 6 on a 1–5 scale), which can be traced back to the specific questionnaire row for correction.
Why must variables be defined before entering data in SPSS?
How should gender, education, and university be handled if they aren’t inherently numeric?
What’s the difference between a latent construct and its indicators in SPSS data entry?
How are Likert-scale survey items coded, and what constraint should the dataset follow?
What’s an effective way to detect data-entry errors after entering responses?
Review Questions
- What variable properties (type/measurement level) should be set for age, gender, education, and university, and why do they differ?
- How would you code a five-point Likert item and ensure respondents’ choices remain within the valid range during data entry?
- After running frequencies, what steps would you take to trace and fix an out-of-range value back to the original questionnaire?
Key Points
- 1
Define all variables in “Variable View” before entering any responses in “Data View.”
- 2
Convert categorical demographics (gender, education, university) into numeric codes and attach value labels for interpretability.
- 3
Treat latent constructs as unobserved: enter only their measured indicators (e.g., multiple Likert items) rather than the latent variable itself.
- 4
Use consistent numeric coding for Likert items (e.g., 1–5) and ensure each respondent has exactly one value per item.
- 5
Set appropriate measurement levels: age as interval/ratio, gender as nominal, education as ordinal, and university as nominal.
- 6
Assign questionnaire numbers to rows so data-entry mistakes can be traced back to the correct paper questionnaire.
- 7
Validate data entry using minimum/maximum checks (frequencies) and correct any out-of-range values by locating the offending cell and re-checking the source questionnaire.