Thematic analysis - How many codes do you need?

TL;DR

Start thematic analysis with detailed, descriptive coding to ensure comprehensive coverage of what participants said.

Briefing Cornell Notes

Briefing

The safest way to start thematic analysis is to code broadly and in detail—even if the coding framework looks messy—because early-stage coding is meant to capture everything participants said before themes are formed. When codes are plentiful, researchers have a “backbone” that later supports refining, merging, and interpreting patterns; when codes are too few or too abstract, important evidence can be missed and later theme development becomes guesswork.

A common problem during coding is not having too much work, but having too little coverage. Detailed coding functions like a table of contents for the dataset: each code is a navigational entry that ensures transcripts don’t need to be revisited repeatedly. The workflow described here emphasizes a shift from reading data as individual stories to viewing it through a structured set of codes that represent the whole dataset holistically. That structure matters because themes don’t simply “emerge” on their own; researchers typically decide which themes to develop based on evidence, and that evidence can only be reliably organized through extensive coding.

The argument for “many codes” rests on three interconnected reasons. First, comprehensive coding prevents loss of content. If researchers code only lightly, they risk leaving parts of interviews uncovered, which undermines confidence that nothing important was overlooked. Second, detailed descriptive coding helps researchers avoid premature judgments about relevance. At the start, it’s impossible to know what will later prove important for the research questions, so the coding stage should capture experiences in concrete terms rather than prematurely labeling them with broad, interpretive concepts.

An example illustrates the difference between descriptive and overly abstract coding. If a participant describes being bullied and then reframing it as something that makes them stronger, a descriptive approach would code the bullying experience and the coping strategy (e.g., “turning challenges into advantages”) rather than collapsing it immediately into a high-level idea like “identity transformation.” That level of specificity creates multiple entry points for later analysis. When similar coping strategies appear elsewhere, the researcher can recognize a recurring pattern and legitimately elevate it into a theme or subtheme.

Third, extensive descriptive coding supports validity by reducing researcher bias. The early coding stage is treated as a safeguard against letting prior knowledge or theoretical expectations dictate labels too soon. By summarizing what participants said—rather than interpreting what they “must mean”—researchers delay abstraction until there is enough coded evidence to justify it. Over time, codes can be transformed into more interpretive, higher-level themes, but that interpretive step comes after the dataset has been thoroughly organized.

Overall, the guidance is pragmatic: if a coding framework feels overwhelming, that’s often a sign of adequate coverage. The recommendation is to keep many descriptive codes at first, then systematically reduce and refine them later—because starting with too few codes can leave researchers without the evidence base needed for credible, research-question-relevant themes.

Cornell Notes

Thematic analysis depends on coding that is detailed enough to capture essentially everything participants said before themes are chosen. Coding is treated like a “table of contents” that lets researchers move from individual interview stories to a holistic view of patterns across the dataset. Having many descriptive codes early prevents missed evidence, reduces the risk of deciding what matters too soon, and helps limit researcher bias by delaying interpretive labeling. Valid findings are more likely when early coding stays close to participants’ wording and only later becomes more abstract as themes are developed.

Why does the guidance favor “too many codes” over “not enough codes” at the start of thematic analysis?

Because early coding is meant to ensure coverage of the dataset. Many codes create a backbone that organizes everything participants said so researchers don’t have to keep re-reading transcripts to check what was missed. With too few codes, there’s no reliable way to confirm that important content is represented, and later theme development becomes vulnerable to gaps in evidence.

How does coding function like a “table of contents,” and why does that matter for theme development?

Codes act as navigational entries for the dataset, often visible as a list in qualitative software. The goal is that most or all meaningful segments from interviews are captured under one code or another. This structure supports a shift from reading data as separate stories to analyzing patterns across the whole dataset—an essential step because themes are developed based on evidence rather than simply appearing.

What’s the risk of using overly abstract or interpretive codes too early?

It increases researcher bias. If researchers jump straight to high-level concepts from prior knowledge, they may put words into participants’ mouths instead of summarizing what was actually said. The guidance recommends staying descriptive at the coding stage—capturing experiences and coping strategies in concrete terms—then becoming more interpretive only when there’s enough coded evidence to justify themes.

How does the bullying example demonstrate the difference between descriptive coding and premature abstraction?

When a participant describes being bullied and reframing it as making them stronger, descriptive coding would include both the event (“being bullied”) and the coping strategy (“turning challenges into advantages”). That specificity matters because later, repeated instances of the coping strategy can be recognized as a pattern and elevated into a theme or subtheme. If the segment were coded only as a broad concept like “identity transformation,” the pathway to evidence-based theme-building would be weaker.

Why is descriptive, detailed coding also a strategy for relevance—especially when researchers don’t yet know what will matter?

At the early stage, researchers can’t reliably predict which aspects will become important for the research questions. Detailed descriptive coding preserves options: it keeps multiple potential meanings and patterns available for later synthesis. This reduces the chance that something relevant gets lost because it was judged too early as unimportant.

Review Questions

What does it mean to treat codes as a “table of contents,” and how would you check whether your coding has adequate coverage?
Give an example of how you would code a participant’s coping strategy descriptively before turning it into a theme.
How does delaying interpretive labeling during early coding help reduce researcher bias and improve validity?

Key Points

1
Start thematic analysis with detailed, descriptive coding to ensure comprehensive coverage of what participants said.
2
Use coding as a “table of contents” so each transcript segment is represented in codes and doesn’t need constant re-checking.
3
Avoid premature decisions about relevance; early coding should capture everything because importance is unknown until patterns are visible.
4
Prefer descriptive labels over overly abstract, interpretive concepts during initial coding to reduce researcher bias.
5
Code multiple aspects of a single fragment when appropriate (e.g., experience plus coping strategy) to create evidence-rich pathways to themes.
6
Recognize that themes are typically developed based on evidence rather than simply emerging automatically.
7
After building a strong coding backbone, refine by reducing and merging codes—without sacrificing validity at the start.

Highlights

Many codes early on are treated as a strength because they create a backbone that supports later theme development.

Coding is framed as a table of contents: the aim is that transcripts can be covered through codes so researchers don’t need to revisit them to confirm nothing was missed.

Descriptive coding helps prevent researcher bias by keeping early labels close to participants’ accounts before abstraction begins.

The bullying example shows how capturing both the event and the coping strategy makes later theme recognition more evidence-based.

Topics

Thematic Analysis
Coding
Research Validity
Researcher Bias
Descriptive Codes

Mentioned

Dr Kriukow