Get AI summaries of any video or article — Sign up free
Qualitative Coding for beginners - aggregating codes and cleaning up "dirty" codes (NVIVO) thumbnail

Qualitative Coding for beginners - aggregating codes and cleaning up "dirty" codes (NVIVO)

5 min read

Based on Qualitative Researcher Dr Kriukow's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Aggregating codes ensures a parent node’s frequency equals the sum of its child nodes, which supports credible frequency reporting.

Briefing

Aggregating codes is essential for presenting clean, defensible counts in qualitative analysis—especially when a main code (or theme) contains child codes (subcodes). In NVivo, “aggregated” coding means the main code’s frequency should equal the sum of its child codes. That matters because results tables and frequency reporting typically assume the totals are complete and internally consistent; if they don’t add up, the coding structure is unreliable.

The walkthrough uses an “emotions” example. The main code “emotions” has eight child codes (stress, happiness, anxiety, love, envy, fear, etc.). When coding is properly aggregated, opening the main code shows eight references that correspond exactly to the extracts assigned to each child code. A key rule follows: there should be no additional extracts coded as “emotions” at the parent level unless they also belong to one of the child codes. If extra parent-level coding exists without a matching child code, the parent total will include “unaccounted” references and break the expected arithmetic.

NVivo supports this with a specific operation: right-click the main code and choose “Aggregate coding from children.” When selected, the main code’s count reflects the sum of its child codes. The transcript then contrasts this with a “bad coding” scenario where the parent code “emotions” still lists eight child codes, yet the total number of references is higher than expected (e.g., 14 instead of 8). That mismatch signals “dirty codes”—coding artifacts where something is coded at the parent level but not assigned to any of the intended child codes, or where extracts are coded in overlapping ways that inflate totals.

To diagnose and clean these issues, the method is to temporarily disable aggregation (by unchecking “Aggregate coding from children”) to reveal how many extracts are truly coded directly at the parent level. In the example, unchecking aggregation shows six “dirty” items—extracts coded as “emotions” but not as any of the listed child emotions. Cleaning involves un-coding those extracts from the parent “emotions” node (and, where needed, creating the missing child code—such as adding a “shy” child code when an extract was coded only at the parent level). Once the dirty codes are removed and the missing child codes are properly created, aggregating from children restores the expected totals.

Finally, the transcript explains how dirty codes commonly happen during early coding. Analysts often start by coding an early extract as a broad parent concept (e.g., “emotions”), then later decide to create child codes (stress, happiness, etc.) as patterns become clearer. If the analyst doesn’t go back and re-code earlier parent-level assignments into the appropriate child nodes, the dataset ends up with parent-level leftovers. The fix is straightforward: identify parent-level extras by turning off aggregation, re-code them into the correct child nodes (or create missing children), then re-aggregate so the counts add up cleanly.

Cornell Notes

Aggregating codes makes a parent node’s frequency equal the sum of its child nodes, which is crucial when reporting counts in qualitative results. In NVivo, “Aggregate coding from children” should produce parent totals that match the combined frequencies of the child codes. When totals don’t add up, the dataset contains “dirty codes”: extracts coded at the parent level but not assigned to any child code, or overlapping parent/child coding that inflates counts. The cleaning workflow is to uncheck aggregation to expose the parent-level extras, uncode them from the parent, and create missing child codes when needed. After cleanup, re-aggregate so the parent count becomes internally consistent.

What does “aggregating codes” mean in NVivo, and why does it affect reported frequencies?

Aggregating codes means the parent node’s count is recalculated from its child nodes. In NVivo, this is done by right-clicking a main code and selecting “Aggregate coding from children.” When aggregation is correct, the parent code’s frequency equals the sum of all child code frequencies. This matters because results tables and frequency reporting typically assume totals are complete and consistent with the coding hierarchy.

How can someone tell whether a parent code’s count is trustworthy?

The transcript’s diagnostic is arithmetic consistency. If a parent node (like “emotions”) has eight child codes and each child appears once, the parent should show a total of eight when aggregation is enabled. If the parent shows a higher number (e.g., 14), something is wrong—either extracts are coded at the parent level without belonging to any child code, or coding overlaps in a way that inflates totals.

What are “dirty codes,” and what do they look like in the emotions example?

Dirty codes are extracts assigned to a parent node but not to any of the intended child nodes. In the example, unchecking aggregation reveals six parent-level items that don’t map to the listed child emotions. These leftovers inflate the parent total beyond the sum of child counts, producing a mismatch like 14 instead of 8.

What is the step-by-step method to clean dirty codes?

First, uncheck “Aggregate coding from children” to expose how many references are truly coded directly at the parent level. Then, open the parent node’s extracts and identify which ones are not represented by any child code. Uncode those extracts from the parent node. If an extract belongs to a concept that wasn’t created as a child node yet (e.g., “shy”), create the missing child code and code the extract there. Finally, re-aggregate from children so totals add up.

Why do dirty codes commonly appear during early-stage coding?

Dirty codes often emerge when analysts start with a broad parent label (e.g., code the first emotion extract as “emotions”), then later create child codes (stress, happiness, etc.) as more examples appear. If earlier parent-level coding isn’t revisited and reassigned into the new child nodes, some extracts remain coded only at the parent level, creating leftovers that aggregation can’t account for correctly.

Review Questions

  1. In NVivo, what does “Aggregate coding from children” change about a parent node’s frequency, and how would you verify it’s working correctly?
  2. If a parent node’s total references don’t equal the sum of its child nodes, what specific NVivo action helps reveal the cause?
  3. Describe a realistic workflow that prevents dirty codes from accumulating during iterative coding.

Key Points

  1. 1

    Aggregating codes ensures a parent node’s frequency equals the sum of its child nodes, which supports credible frequency reporting.

  2. 2

    In NVivo, use right-click → “Aggregate coding from children” to make parent counts reflect child coding.

  3. 3

    A mismatch between expected totals (sum of child counts) and the parent total is a red flag for dirty codes.

  4. 4

    Dirty codes are extracts coded at the parent level without being assigned to any child node, inflating parent frequencies.

  5. 5

    To clean dirty codes, uncheck aggregation to expose parent-level leftovers, then uncode them from the parent and recode them into the correct child nodes.

  6. 6

    If an extract represents a concept that lacks a child node, create the missing child code before re-aggregating.

  7. 7

    Dirty codes often appear when early coding uses broad parent labels and later decisions introduce child codes without retroactive re-coding.

Highlights

Parent totals should behave like accounting: the “emotions” parent count must equal the sum of its child emotions once aggregation is enabled.
Turning off aggregation is the fastest way to reveal hidden parent-level leftovers—the core diagnostic for dirty codes.
Dirty codes usually come from iterative coding: broad parent labels get used first, then child nodes get created later without cleaning up earlier assignments.
Cleaning is practical: uncode parent-level extras, create missing child nodes when needed, then re-aggregate so counts add up.

Topics

Mentioned