13. SPSS Classroom - Assess Respondent Misconduct in Survey Research
Based on Research With Fawad's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Sort the last survey columns to identify incomplete rows that signal early dropout.
Briefing
A practical way to clean survey data starts with spotting respondents who either quit early or answer in a way that suggests they never read the questions. One quick check is to sort the last few columns of the dataset in ascending order to identify incomplete rows—cases where a respondent stopped answering partway through. If the missing data is limited (for example, the respondent skipped only the last one or two items), the response can often be retained because the rest of the answers may still be usable. But if the respondent left a large share of the questionnaire unanswered—on the order of 40–50%—that record should be deleted after deciding that the level of missingness makes the remaining answers unreliable.
Next comes “respondent misconduct,” where answers show suspicious uniformity. On a 1–7 Likert scale, it’s normal to see variation across items, because people rarely feel exactly the same way for every question. When a respondent selects nearly the same option for every item, it raises the likelihood that the person is not reading and is instead clicking through mechanically. To catch this, the transcript recommends adding attention checks to the survey—such as items that ask respondents to select a specific number on the 1–7 scale, or using reverse-coded questions that should produce different patterns if the respondent is actually processing the items.
For a more quantitative screen, the transcript highlights using each respondent’s standard deviation across their Likert responses. Low variability is a red flag for straight-lining (answering the same way repeatedly). While SPSS can compute standard deviation, the workflow described uses Excel as a faster alternative: enter the standard deviation function in a new column, apply it across only the Likert items (excluding the respondent ID column), and then fill the formula down for all respondents. After calculating standard deviation per row, sort the results from smallest to largest to find cases with extremely low variance.
A commonly used rule of thumb in the transcript is to strongly consider deleting any respondent record with a standard deviation below 0.25, since it indicates little to no variation across the survey items. However, it also stresses that there is no universal “golden rule.” The acceptable threshold depends on the survey’s size and context, and researchers should judge whether the respondent’s pattern is plausible. The key takeaway is not automatic deletion, but a structured decision process: remove clear dropouts, flag likely straight-liners, and use standard deviation plus attention checks to determine which records are valid enough to keep.
Cornell Notes
Survey data cleaning can target two main problems: incomplete responses and respondent misconduct. Incomplete cases can be found by sorting the last survey columns to identify rows where respondents stopped answering; keep records with only a small amount of missing data, but delete those with large gaps (e.g., 40–50% unanswered). Misconduct often appears as “straight-lining” on Likert scales, where a respondent picks nearly the same number for every item. Attention checks (specific-number prompts and reverse-coded items) help detect this behavior. For a quantitative screen, compute each respondent’s standard deviation across Likert items in Excel (excluding the ID column); values below 0.25 are a strong warning sign, though the threshold should be judged based on survey context.
How can researchers quickly identify respondents who abandoned a questionnaire partway through?
What pattern on a 1–7 Likert scale suggests respondent misconduct?
What are attention checks, and how do they help detect misconduct?
How can standard deviation be used to flag straight-lining in survey responses?
What threshold is recommended for standard deviation, and why isn’t it automatic?
Review Questions
- What decision rule should be applied when a respondent skips only the last one or two survey items versus skipping 40–50% of the questionnaire?
- Why does low standard deviation across Likert items often indicate misconduct, and how would you calculate it in Excel?
- What kinds of attention checks (e.g., specific-number prompts or reverse-coded items) would you add to a 1–7 Likert survey to detect straight-lining?
Key Points
- 1
Sort the last survey columns to identify incomplete rows that signal early dropout.
- 2
Keep responses when missingness is limited (such as skipping only the last one or two items), but delete records with large missing portions (around 40–50%).
- 3
Treat near-identical Likert responses across all items as a misconduct red flag because real attitudes typically vary across questions.
- 4
Add attention checks, including specific-number selection prompts and reverse-coded questions, to verify respondents are reading.
- 5
Compute each respondent’s standard deviation across Likert items (excluding the ID column) to quantify straight-lining.
- 6
Use a standard deviation threshold around 0.25 as a strong warning sign, while still making context-dependent judgments rather than applying a universal rule.