Qualitative data analysis - Coding Tutorial - Initial Codes | "From Codes to Themes" episode 1
Based on Qualitative Researcher Dr Kriukow's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Code initial transcripts line by line and capture as much detail as possible before judging relevance to the research questions.
Briefing
Coding in qualitative research starts with building a detailed, searchable set of “initial codes” that function like personal notes—capturing what participants actually say before deciding what matters for the study. Using NVivo as a demonstration tool, the process begins by reading interview transcripts line by line and attaching short, descriptive labels to specific segments of text. The goal isn’t to be selective or perfectly aligned with the eventual research questions on day one; it’s to capture as much relevant detail as possible so later stages can sort, compare, and consolidate.
The tutorial frames coding around a hypothetical study of factors influencing job retention among chefs and actors—why people stay or leave. Because the data is secondary (interview transcripts collected for other purposes), the researcher can’t rely on the original questions to match the study’s aims. Instead, coding becomes the mechanism for extracting whatever factors the transcripts mention. At this early stage, research questions are treated as secondary to thoroughness: the priority is to code everything in detail, since something that seems irrelevant in one transcript may become important after reviewing multiple transcripts.
A key practical principle is naming codes in a way that preserves meaning. Generic labels like “the most fun” are discouraged because they don’t tell what the participant is actually describing. Better codes mirror the content of the extract—for example, labeling “the most fun was when there was danger,” based on the participant’s specific description of danger during filmmaking. The tutorial emphasizes that code names should be clear enough to work as a “table of contents” when browsing the code list later; in NVivo, double-clicking a code jumps back to the coded passage, so descriptive naming speeds up retrieval.
The coding approach also tolerates overlap. The same sentence can receive multiple codes when different aspects are present, and the tutorial treats having many codes as an advantage. The only real risk is under-coding—missing potentially meaningful patterns because there wasn’t enough coverage. This is illustrated with segments that combine themes like danger, stimulation, adventure, and pushing boundaries; multiple codes are applied so connections can be recognized later.
As coding progresses, the tutorial distinguishes between codes that are too broad or vague (like “fascination,” which may require later review) and codes that are grounded in specific wording (like “autonomy and control” tied to smaller vs. bigger film productions). It also acknowledges that interpretation can creep in, but the strategy is to stay cautious: when uncertain, create additional codes to “validate” by revisiting the extract later. Before moving to themes, every coded extract is expected to be reviewed so decisions about relevance are evidence-based.
Finally, the tutorial positions codes as tools for understanding the data, not final outputs for a broad audience. Themes come later; initial codes are primarily for sense-making and organization. The next step—covered in later parts of the series—is reorganizing these initial codes into more focused structures and then building themes from them.
Cornell Notes
Initial coding is treated as a thorough, evidence-first pass: read each transcript carefully and attach descriptive “initial codes” to specific text segments. Code names should reflect what participants actually said so the code list works like a table of contents when revisiting passages later (e.g., coding “the most fun was when there was danger” rather than a vague label). The process encourages detailed coverage, including applying multiple codes to the same sentence when different ideas are present, because under-coding can hide patterns. Research questions matter more later when deciding which codes form themes; early on, the priority is capturing enough detail to make later decisions confidently. Codes are personal tools for sense-making, not the final public-facing product.
Why does the tutorial recommend coding everything in detail before aligning to research questions?
What makes a “good” code name in this approach?
Is it a problem to assign multiple codes to the same sentence or extract?
How does the tutorial handle uncertainty or interpretation while coding?
What’s the difference between initial codes and themes in this workflow?
Review Questions
- When would it be better to use a specific code name derived from an participant’s wording instead of a generic label?
- Why does the workflow treat under-coding as riskier than over-coding during initial passes?
- How should a coder respond when an extract seems ambiguous—create fewer codes or more, and why?
Key Points
- 1
Code initial transcripts line by line and capture as much detail as possible before judging relevance to the research questions.
- 2
Name codes descriptively so they function as a table of contents when reviewing the code list later (avoid vague labels).
- 3
Apply multiple codes to the same segment when it contains distinct ideas; overlap supports later pattern detection.
- 4
Treat research questions as more important during theme-building, not during the first pass of initial coding.
- 5
When uncertain about meaning, revisit extracts later and consider adding additional codes now to preserve possible connections.
- 6
Review every coded extract before deciding what to do next; don’t rely on assumptions made during the first pass.
- 7
Use codes as personal tools for sense-making; themes come later as broader syntheses.