Thematic analysis - How many codes do you need?
Based on Qualitative Researcher Dr Kriukow's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Start thematic analysis with detailed, descriptive coding to ensure comprehensive coverage of what participants said.
Briefing
The safest way to start thematic analysis is to code broadly and in detail—even if the coding framework looks messy—because early-stage coding is meant to capture everything participants said before themes are formed. When codes are plentiful, researchers have a “backbone” that later supports refining, merging, and interpreting patterns; when codes are too few or too abstract, important evidence can be missed and later theme development becomes guesswork.
A common problem during coding is not having too much work, but having too little coverage. Detailed coding functions like a table of contents for the dataset: each code is a navigational entry that ensures transcripts don’t need to be revisited repeatedly. The workflow described here emphasizes a shift from reading data as individual stories to viewing it through a structured set of codes that represent the whole dataset holistically. That structure matters because themes don’t simply “emerge” on their own; researchers typically decide which themes to develop based on evidence, and that evidence can only be reliably organized through extensive coding.
The argument for “many codes” rests on three interconnected reasons. First, comprehensive coding prevents loss of content. If researchers code only lightly, they risk leaving parts of interviews uncovered, which undermines confidence that nothing important was overlooked. Second, detailed descriptive coding helps researchers avoid premature judgments about relevance. At the start, it’s impossible to know what will later prove important for the research questions, so the coding stage should capture experiences in concrete terms rather than prematurely labeling them with broad, interpretive concepts.
An example illustrates the difference between descriptive and overly abstract coding. If a participant describes being bullied and then reframing it as something that makes them stronger, a descriptive approach would code the bullying experience and the coping strategy (e.g., “turning challenges into advantages”) rather than collapsing it immediately into a high-level idea like “identity transformation.” That level of specificity creates multiple entry points for later analysis. When similar coping strategies appear elsewhere, the researcher can recognize a recurring pattern and legitimately elevate it into a theme or subtheme.
Third, extensive descriptive coding supports validity by reducing researcher bias. The early coding stage is treated as a safeguard against letting prior knowledge or theoretical expectations dictate labels too soon. By summarizing what participants said—rather than interpreting what they “must mean”—researchers delay abstraction until there is enough coded evidence to justify it. Over time, codes can be transformed into more interpretive, higher-level themes, but that interpretive step comes after the dataset has been thoroughly organized.
Overall, the guidance is pragmatic: if a coding framework feels overwhelming, that’s often a sign of adequate coverage. The recommendation is to keep many descriptive codes at first, then systematically reduce and refine them later—because starting with too few codes can leave researchers without the evidence base needed for credible, research-question-relevant themes.
Cornell Notes
Thematic analysis depends on coding that is detailed enough to capture essentially everything participants said before themes are chosen. Coding is treated like a “table of contents” that lets researchers move from individual interview stories to a holistic view of patterns across the dataset. Having many descriptive codes early prevents missed evidence, reduces the risk of deciding what matters too soon, and helps limit researcher bias by delaying interpretive labeling. Valid findings are more likely when early coding stays close to participants’ wording and only later becomes more abstract as themes are developed.
Why does the guidance favor “too many codes” over “not enough codes” at the start of thematic analysis?
How does coding function like a “table of contents,” and why does that matter for theme development?
What’s the risk of using overly abstract or interpretive codes too early?
How does the bullying example demonstrate the difference between descriptive coding and premature abstraction?
Why is descriptive, detailed coding also a strategy for relevance—especially when researchers don’t yet know what will matter?
Review Questions
- What does it mean to treat codes as a “table of contents,” and how would you check whether your coding has adequate coverage?
- Give an example of how you would code a participant’s coping strategy descriptively before turning it into a theme.
- How does delaying interpretive labeling during early coding help reduce researcher bias and improve validity?
Key Points
- 1
Start thematic analysis with detailed, descriptive coding to ensure comprehensive coverage of what participants said.
- 2
Use coding as a “table of contents” so each transcript segment is represented in codes and doesn’t need constant re-checking.
- 3
Avoid premature decisions about relevance; early coding should capture everything because importance is unknown until patterns are visible.
- 4
Prefer descriptive labels over overly abstract, interpretive concepts during initial coding to reduce researcher bias.
- 5
Code multiple aspects of a single fragment when appropriate (e.g., experience plus coping strategy) to create evidence-rich pathways to themes.
- 6
Recognize that themes are typically developed based on evidence rather than simply emerging automatically.
- 7
After building a strong coding backbone, refine by reducing and merging codes—without sacrificing validity at the start.