Qualitative Coding for beginners - aggregating codes and cleaning up "dirty" codes (NVIVO)
Based on Qualitative Researcher Dr Kriukow's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Aggregating codes ensures a parent node’s frequency equals the sum of its child nodes, which supports credible frequency reporting.
Briefing
Aggregating codes is essential for presenting clean, defensible counts in qualitative analysis—especially when a main code (or theme) contains child codes (subcodes). In NVivo, “aggregated” coding means the main code’s frequency should equal the sum of its child codes. That matters because results tables and frequency reporting typically assume the totals are complete and internally consistent; if they don’t add up, the coding structure is unreliable.
The walkthrough uses an “emotions” example. The main code “emotions” has eight child codes (stress, happiness, anxiety, love, envy, fear, etc.). When coding is properly aggregated, opening the main code shows eight references that correspond exactly to the extracts assigned to each child code. A key rule follows: there should be no additional extracts coded as “emotions” at the parent level unless they also belong to one of the child codes. If extra parent-level coding exists without a matching child code, the parent total will include “unaccounted” references and break the expected arithmetic.
NVivo supports this with a specific operation: right-click the main code and choose “Aggregate coding from children.” When selected, the main code’s count reflects the sum of its child codes. The transcript then contrasts this with a “bad coding” scenario where the parent code “emotions” still lists eight child codes, yet the total number of references is higher than expected (e.g., 14 instead of 8). That mismatch signals “dirty codes”—coding artifacts where something is coded at the parent level but not assigned to any of the intended child codes, or where extracts are coded in overlapping ways that inflate totals.
To diagnose and clean these issues, the method is to temporarily disable aggregation (by unchecking “Aggregate coding from children”) to reveal how many extracts are truly coded directly at the parent level. In the example, unchecking aggregation shows six “dirty” items—extracts coded as “emotions” but not as any of the listed child emotions. Cleaning involves un-coding those extracts from the parent “emotions” node (and, where needed, creating the missing child code—such as adding a “shy” child code when an extract was coded only at the parent level). Once the dirty codes are removed and the missing child codes are properly created, aggregating from children restores the expected totals.
Finally, the transcript explains how dirty codes commonly happen during early coding. Analysts often start by coding an early extract as a broad parent concept (e.g., “emotions”), then later decide to create child codes (stress, happiness, etc.) as patterns become clearer. If the analyst doesn’t go back and re-code earlier parent-level assignments into the appropriate child nodes, the dataset ends up with parent-level leftovers. The fix is straightforward: identify parent-level extras by turning off aggregation, re-code them into the correct child nodes (or create missing children), then re-aggregate so the counts add up cleanly.
Cornell Notes
Aggregating codes makes a parent node’s frequency equal the sum of its child nodes, which is crucial when reporting counts in qualitative results. In NVivo, “Aggregate coding from children” should produce parent totals that match the combined frequencies of the child codes. When totals don’t add up, the dataset contains “dirty codes”: extracts coded at the parent level but not assigned to any child code, or overlapping parent/child coding that inflates counts. The cleaning workflow is to uncheck aggregation to expose the parent-level extras, uncode them from the parent, and create missing child codes when needed. After cleanup, re-aggregate so the parent count becomes internally consistent.
What does “aggregating codes” mean in NVivo, and why does it affect reported frequencies?
How can someone tell whether a parent code’s count is trustworthy?
What are “dirty codes,” and what do they look like in the emotions example?
What is the step-by-step method to clean dirty codes?
Why do dirty codes commonly appear during early-stage coding?
Review Questions
- In NVivo, what does “Aggregate coding from children” change about a parent node’s frequency, and how would you verify it’s working correctly?
- If a parent node’s total references don’t equal the sum of its child nodes, what specific NVivo action helps reveal the cause?
- Describe a realistic workflow that prevents dirty codes from accumulating during iterative coding.
Key Points
- 1
Aggregating codes ensures a parent node’s frequency equals the sum of its child nodes, which supports credible frequency reporting.
- 2
In NVivo, use right-click → “Aggregate coding from children” to make parent counts reflect child coding.
- 3
A mismatch between expected totals (sum of child counts) and the parent total is a red flag for dirty codes.
- 4
Dirty codes are extracts coded at the parent level without being assigned to any child node, inflating parent frequencies.
- 5
To clean dirty codes, uncheck aggregation to expose parent-level leftovers, then uncode them from the parent and recode them into the correct child nodes.
- 6
If an extract represents a concept that lacks a child node, create the missing child code before re-aggregating.
- 7
Dirty codes often appear when early coding uses broad parent labels and later decisions introduce child codes without retroactive re-coding.