Thematic analysis with ChatGPT | PART 1- Coding qualitative data with ChatGPT
Based on Qualitative Researcher Dr Kriukow's video on YouTube. If you like this content, support the original creators by watching, liking and subscribing to their content.
Use ChatGPT to draft initial codes from manageable text excerpts, then validate and refine those codes using researcher judgment.
Briefing
The central takeaway is that ChatGPT can speed up qualitative thematic analysis—especially the early “coding” stage—when researchers treat it as an assistive tool rather than an automatic analyst. The workflow described centers on feeding the model clear excerpts from qualitative data and asking it to generate candidate codes and code labels that can then be reviewed, refined, and organized into a coding framework. That human-in-the-loop step matters because thematic analysis depends on interpretive judgment, not just pattern matching.
The coding process starts with preparing the raw material: selecting manageable text segments (such as interview responses or open-ended survey answers) and ensuring the prompts request outputs that are usable for qualitative work. Instead of asking for a single “theme,” the approach emphasizes granular coding—producing short, descriptive codes tied to specific excerpts. This makes it easier to check whether the codes fit the data, whether multiple excerpts should share the same label, and whether the coding scheme is becoming too broad or too narrow.
A key practical insight is prompt design. The transcript highlights that effective prompts specify the task (e.g., generate codes), the unit of analysis (the excerpt), and the desired output format (e.g., a list of codes with brief definitions and/or example mappings to the text). When prompts are vague, the model tends to produce generic categories that don’t track closely to the participants’ wording. When prompts are structured, the output becomes more auditable—researchers can trace each code back to the underlying quote and decide whether to keep, merge, split, or discard it.
The workflow also addresses consistency. Because qualitative coding can drift across time, researchers can use ChatGPT to propose codes for multiple excerpts using the same instructions, then compare results across segments. That supports a more systematic approach to building a codebook. Still, the responsibility for final decisions remains with the researcher: codes must be checked against the full dataset, and themes should emerge through iterative refinement rather than one-pass automation.
Finally, the transcript frames ChatGPT as a way to reduce the mechanical burden of initial coding—turning long transcripts into organized, reviewable units—while preserving the interpretive core of thematic analysis. The promised “Part 1” focus on coding suggests a staged method: first generate and validate codes, then later aggregate them into themes. In that sequence, the model’s strength lies in drafting and structuring, while the researcher’s strength lies in theoretical alignment, reflexivity, and ensuring the analysis remains grounded in the data.
Cornell Notes
ChatGPT can assist with qualitative thematic analysis by drafting initial codes from text excerpts, but the researcher must review and refine the outputs. The most important factor is using prompts that specify the unit of analysis (each excerpt), the task (coding rather than theme generation), and a structured output format that can be checked against the original wording. This creates auditable, traceable coding suggestions that can be merged into a codebook and compared across multiple excerpts for consistency. The workflow is designed to reduce the mechanical workload of early coding while keeping interpretive control—codes are validated, adjusted, and only then used to build themes in later steps.
How should researchers position ChatGPT in a thematic analysis workflow to avoid losing interpretive control?
Why does the transcript stress coding excerpts rather than jumping straight to themes?
What prompt elements make ChatGPT’s coding outputs more usable for qualitative research?
How can researchers use ChatGPT to improve consistency across coding sessions?
What does “auditable” output mean in this context, and why is it important?
Review Questions
- What changes when ChatGPT is asked to generate codes for excerpts versus themes for the whole dataset?
- Which prompt details most directly improve traceability from model output back to participant text?
- How would you validate and revise a codebook created with ChatGPT-assisted coding?
Key Points
- 1
Use ChatGPT to draft initial codes from manageable text excerpts, then validate and refine those codes using researcher judgment.
- 2
Design prompts that specify the unit of analysis (each excerpt), the task (coding), and a structured output format that supports review.
- 3
Avoid vague prompts that encourage generic categories; structured instructions help keep codes grounded in participants’ wording.
- 4
Build an auditable workflow by ensuring proposed codes can be mapped back to the exact excerpts they came from.
- 5
Use consistent prompt instructions across multiple excerpts to support codebook coherence and reduce coding drift.
- 6
Treat thematic development as iterative: validate codes first, then aggregate validated codes into themes in later steps.
- 7
Keep interpretive responsibility with the researcher—ChatGPT can reduce mechanical workload, but it cannot replace theoretical alignment and reflexivity.