Get AI summaries of any video or article — Sign up free
Qualitative data analysis - Coding Tutorial - Initial Codes | "From Codes to Themes" episode 1 thumbnail

Qualitative data analysis - Coding Tutorial - Initial Codes | "From Codes to Themes" episode 1

5 min read

Based on Qualitative Researcher Dr Kriukow's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.

TL;DR

Code initial transcripts line by line and capture as much detail as possible before judging relevance to the research questions.

Briefing

Coding in qualitative research starts with building a detailed, searchable set of “initial codes” that function like personal notes—capturing what participants actually say before deciding what matters for the study. Using NVivo as a demonstration tool, the process begins by reading interview transcripts line by line and attaching short, descriptive labels to specific segments of text. The goal isn’t to be selective or perfectly aligned with the eventual research questions on day one; it’s to capture as much relevant detail as possible so later stages can sort, compare, and consolidate.

The tutorial frames coding around a hypothetical study of factors influencing job retention among chefs and actors—why people stay or leave. Because the data is secondary (interview transcripts collected for other purposes), the researcher can’t rely on the original questions to match the study’s aims. Instead, coding becomes the mechanism for extracting whatever factors the transcripts mention. At this early stage, research questions are treated as secondary to thoroughness: the priority is to code everything in detail, since something that seems irrelevant in one transcript may become important after reviewing multiple transcripts.

A key practical principle is naming codes in a way that preserves meaning. Generic labels like “the most fun” are discouraged because they don’t tell what the participant is actually describing. Better codes mirror the content of the extract—for example, labeling “the most fun was when there was danger,” based on the participant’s specific description of danger during filmmaking. The tutorial emphasizes that code names should be clear enough to work as a “table of contents” when browsing the code list later; in NVivo, double-clicking a code jumps back to the coded passage, so descriptive naming speeds up retrieval.

The coding approach also tolerates overlap. The same sentence can receive multiple codes when different aspects are present, and the tutorial treats having many codes as an advantage. The only real risk is under-coding—missing potentially meaningful patterns because there wasn’t enough coverage. This is illustrated with segments that combine themes like danger, stimulation, adventure, and pushing boundaries; multiple codes are applied so connections can be recognized later.

As coding progresses, the tutorial distinguishes between codes that are too broad or vague (like “fascination,” which may require later review) and codes that are grounded in specific wording (like “autonomy and control” tied to smaller vs. bigger film productions). It also acknowledges that interpretation can creep in, but the strategy is to stay cautious: when uncertain, create additional codes to “validate” by revisiting the extract later. Before moving to themes, every coded extract is expected to be reviewed so decisions about relevance are evidence-based.

Finally, the tutorial positions codes as tools for understanding the data, not final outputs for a broad audience. Themes come later; initial codes are primarily for sense-making and organization. The next step—covered in later parts of the series—is reorganizing these initial codes into more focused structures and then building themes from them.

Cornell Notes

Initial coding is treated as a thorough, evidence-first pass: read each transcript carefully and attach descriptive “initial codes” to specific text segments. Code names should reflect what participants actually said so the code list works like a table of contents when revisiting passages later (e.g., coding “the most fun was when there was danger” rather than a vague label). The process encourages detailed coverage, including applying multiple codes to the same sentence when different ideas are present, because under-coding can hide patterns. Research questions matter more later when deciding which codes form themes; early on, the priority is capturing enough detail to make later decisions confidently. Codes are personal tools for sense-making, not the final public-facing product.

Why does the tutorial recommend coding everything in detail before aligning to research questions?

Early coding is framed as an exploratory capture step. Even if a segment doesn’t seem relevant to the study’s job-retention question at first, it may connect to something discovered in later transcripts. Since the data is secondary—interview transcripts collected for other purposes—there’s no guarantee the original questions match the study’s aims. Coding broadly first prevents missing factors that only become obvious after comparing multiple coded extracts.

What makes a “good” code name in this approach?

A good code name is specific and descriptive enough to stand alone when browsing the code list later. The tutorial contrasts vague labels (like “the most fun”) with content-faithful labels derived from the participant’s wording (like “the most fun was when there was danger”). The practical reason is retrieval: in NVivo, double-clicking a code jumps to the coded passage, so clear names speed up review and help the researcher remember what each extract is about.

Is it a problem to assign multiple codes to the same sentence or extract?

No—overlap is treated as useful when multiple ideas are present. The tutorial repeatedly emphasizes that having many codes is preferable to having too few. Multiple codes can capture different dimensions of the same moment (e.g., danger, stimulation, adventure, pushing boundaries), which supports later pattern-finding when codes are reorganized into themes.

How does the tutorial handle uncertainty or interpretation while coding?

It allows cautious interpretation but discourages assuming too much. When meaning isn’t fully clear, the coder may create a broader or provisional code (like “fascination”) and plan to revisit it by opening the extract later. The strategy is to compensate for uncertainty by coding more detail now, then reviewing every coded extract before deciding what to do next.

What’s the difference between initial codes and themes in this workflow?

Initial codes are internal tools—notes for organizing and understanding the data. Themes are broader, audience-facing constructs built later from patterns across codes. The tutorial stresses that the coding stage is about making sense of the transcripts, not producing final, polished claims.

Review Questions

  1. When would it be better to use a specific code name derived from an participant’s wording instead of a generic label?
  2. Why does the workflow treat under-coding as riskier than over-coding during initial passes?
  3. How should a coder respond when an extract seems ambiguous—create fewer codes or more, and why?

Key Points

  1. 1

    Code initial transcripts line by line and capture as much detail as possible before judging relevance to the research questions.

  2. 2

    Name codes descriptively so they function as a table of contents when reviewing the code list later (avoid vague labels).

  3. 3

    Apply multiple codes to the same segment when it contains distinct ideas; overlap supports later pattern detection.

  4. 4

    Treat research questions as more important during theme-building, not during the first pass of initial coding.

  5. 5

    When uncertain about meaning, revisit extracts later and consider adding additional codes now to preserve possible connections.

  6. 6

    Review every coded extract before deciding what to do next; don’t rely on assumptions made during the first pass.

  7. 7

    Use codes as personal tools for sense-making; themes come later as broader syntheses.

Highlights

Descriptive code names matter because they let the code list act like a table of contents—especially when software jumps back to the coded text.
Over-coding is framed as safer than under-coding: missing a potentially relevant factor early can’t be recovered later.
Multiple codes per sentence are encouraged when different aspects are present, supporting richer pattern-finding later.
Interpretation is allowed but handled cautiously; ambiguous extracts may get provisional codes that are revisited after more context is gathered.
Initial codes are internal notes, while themes are the later step meant for broader claims.

Topics

Mentioned