How Codes become Themes in NVivo 12
Based on Qualitative Researcher Dr Kriukow's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Codes are early organizational labels for text extracts, while themes are research-relevant categories formed by grouping and merging codes.
Briefing
Turning a messy list of NVivo “codes” into a usable thematic framework hinges on one practical shift: codes are labels for organizing text early on, while themes are the research-relevant categories that later become the reported findings. In NVivo terms, a code functions like a unit of analysis—short descriptors applied to text extracts so the data can be sorted and retrieved. Themes emerge when those codes get grouped, merged, and reorganized into more inclusive categories that align with the study’s research questions. In reporting, themes—not codes—are what readers see as the study’s main topics.
The walkthrough uses an imaginary study about the advantages and disadvantages of online learning to show how that transformation happens. After coding across multiple transcripts, the code list can balloon into dozens of overlapping labels (the example starts with 58 codes). The first move is not deleting or merging right away; it’s creating a structure that makes sense of the clutter. The analyst creates folders in NVivo: one to preserve the “initial codes” and another for the “thematic framework,” then copies the codes into the working folder so the original coding remains intact in case some labels are needed later.
Next comes the initial categorization step. Because the research questions explicitly ask about advantages and disadvantages, the analyst begins with a straightforward split: “positives” and “negatives.” Codes are moved into these groups based on their meaning (for instance, “more self-belief” and “engaging lessons” land in positives, while “stress,” “distractions,” and “slow internet” land in negatives). This step reduces visual noise and makes patterns easier to spot.
As the categories take shape, the analyst also identifies emergent clusters that deserve their own thematic treatment. A key example is psychological effects: multiple codes related to stress, anxiety, self-confidence, and self-esteem are consolidated into a dedicated theme called “psychological effects of online learning.” The analyst notes that motivation can be broad; some motivation-related codes remain under advantages/disadvantages, while the psychological cluster becomes a separate theme because it appears substantial enough to report on independently.
The final refinement is merging duplicates and tightening sub-themes. The analyst reviews codes that appear only once, then checks whether they are really distinct or just different wording for the same idea. For example, “requires good internet connection” is merged into a broader internet-connection code, and “sister plays loudly…” is merged into “distractions at home.” Over time, the framework becomes more specific and coherent: broad ideas like convenience and engagement break into clearer sub-themes such as “quick access to lesson materials,” “variety of materials,” and “more engaging than traditional classrooms.”
To quantify how strong each theme is, NVivo’s “aggregate coding from children” is used. After codes are nested under themes, aggregating coding updates counts so “advantages,” “disadvantages,” and “psychological effects” reflect the total references from their child nodes. The result is a thematic framework that is both research-aligned and measurable—ready for analysis and reporting rather than just early-stage data organization.
Cornell Notes
Codes in NVivo are early labels applied to text extracts to organize data. Themes are broader, research-relevant categories that form when codes are grouped, merged, and reorganized around the study’s research questions. In the online learning example, the analyst starts with 58 codes, creates folders to preserve initial coding, then builds a thematic framework by first splitting codes into positives (advantages) and negatives (disadvantages). As patterns emerge, a dedicated theme—“psychological effects of online learning”—is created and related codes are moved into it. Finally, duplicate or overlapping codes are merged, and NVivo “aggregate coding from children” updates theme counts so the framework can be reported with evidence.
What is the practical difference between a code and a theme in NVivo?
Why create an “initial codes” folder before building a thematic framework?
How does the analyst reduce a long list of codes (58 in the example) into something manageable?
When should a new theme be created instead of keeping everything under advantages/disadvantages?
What does “merging duplicates” look like during thematic framework development?
How are theme counts updated in NVivo after reorganizing codes under themes?
Review Questions
- How would you decide whether a cluster of codes should become a separate theme rather than staying within advantages/disadvantages?
- What are the risks of deleting codes too early during thematic framework development, and how does the folder strategy address them?
- After merging duplicate codes into broader nodes, what NVivo function ensures parent theme counts reflect the new structure?
Key Points
- 1
Codes are early organizational labels for text extracts, while themes are research-relevant categories formed by grouping and merging codes.
- 2
Preserve an “initial codes” set before reorganizing so deleted or merged nodes can be recovered for future work.
- 3
Start thematic framework building with top-level categories that match the research questions (e.g., positives/negatives for advantages/disadvantages).
- 4
Create additional themes when an emergent cluster is strong enough to report separately (e.g., “psychological effects of online learning”).
- 5
Merge overlapping codes by checking whether different labels describe the same underlying idea, not by relying on wording alone.
- 6
Use NVivo’s “aggregate coding from children” to update theme strength counts after codes are moved under parent nodes.