Thematic analysis with ChatGPT - coding qualitative data (2025 method)

TL;DR

Use ChatGPT to generate initial codes first, not themes, to preserve an audit trail from evidence to interpretation.

Briefing Cornell Notes

Briefing

Qualitative thematic analysis becomes more defensible when ChatGPT is used to produce an auditable coding trail—not just end themes. The core shift in this workflow is moving from a Word-based setup to an Excel-based one, letting ChatGPT generate structured “initial codes” that map directly onto each interview excerpt. That structure is meant to strengthen credibility because every later theme can be traced back to specific coded segments.

The method starts with a strict prompt that forces ChatGPT to stay at the coding stage. Instead of asking for themes, the prompt instructs it to create “initial codes”: detailed, descriptive codes that cover essentially every statement or sentence in the data, without speculating about themes. The emphasis is on building the foundation first—codes as the building blocks—because AI tools that jump straight to themes without showing how the data was coded leave researchers with no audit trail and therefore weak credibility.

Once the prompt is set, the transcript is pasted in. ChatGPT then generates an Excel file with a two-column table: interview text on the left and the corresponding detailed initial codes on the right. The workflow is designed so researchers don’t need to be Excel users; ChatGPT creates the file automatically. The resulting coding output may include unnecessary items (for example, codes that reflect interviewer wording or questions), but the presenter treats that as manageable noise as long as the codes remain descriptive and granular.

The process is repeated for each interview transcript. After all transcripts are coded, the workflow moves to “focus coding,” where the goal is to organize the initial codes into higher-level groupings while still avoiding theme-making too early. Here, the Excel-generated code lists are consolidated into one place and then reorganized. ChatGPT can assist in a “more hands-off” mode by grouping every code into thematic buckets—researchers can specify that groups can be any number of categories, but every code must belong somewhere, even if that requires creating additional subgroups.

In the example, the regrouping produces clusters such as leadership strategies and philosophy, school-related conditional arrival and student observation, engagement and parent/community engagement, and culture-change process codes. The presenter likens this stage to building a table of contents for the dataset: it makes the inventory of what’s in the data easier to see than a long, unstructured list of codes.

From there, the next step is theme development. The workflow encourages researchers to take control at this point—review the organized code groupings, align them with the research questions, and decide what themes actually fit—rather than outsourcing theme creation to ChatGPT. The practical takeaway is that ChatGPT’s new ability to generate Excel outputs can save time on initial coding while preserving the audit trail needed for rigorous qualitative analysis.

Cornell Notes

The workflow uses ChatGPT to support thematic analysis while preserving an audit trail. Instead of requesting themes, prompts require “initial codes”: detailed, descriptive codes that cover nearly every statement in each transcript, with no theme speculation. ChatGPT then generates an Excel file mapping interview text (left column) to initial codes (right column), and the same process is repeated across interviews. After initial coding, “focus coding” groups the codes into organized categories so researchers can see what’s in the dataset like a table of contents. Theme development comes afterward, ideally with researcher control to ensure credibility and alignment with research questions.

Why does this workflow insist on coding before themes when using ChatGPT?

Credibility in qualitative analysis depends on an audit trail: readers must be able to see how raw data became codes and how codes supported themes. The workflow treats themes without coding as unusable because there’s no transparent path from evidence to interpretation. By forcing ChatGPT to generate initial codes first, every later theme can be traced back to specific coded excerpts.

What exactly are “initial codes” in this method, and how are they prompted?

Initial codes are detailed, descriptive codes that cover almost every statement or sentence in the data. The prompt explicitly prevents theme speculation at this stage, because ChatGPT would otherwise “take the shortcut” and produce themes. The goal is factual, granular coverage that later helps the researcher remember what the data is about.

How does the Excel-based approach change the workflow compared with earlier Word-based steps?

ChatGPT now creates an Excel file automatically. The file contains a table where interview text appears in the left column and the corresponding initial codes appear in the right column. This structure makes the coding-output easier to review and consolidate for later focus coding, while also reducing the need for researchers to manually format spreadsheets.

What happens during focus coding, and what constraints are placed on code grouping?

Focus coding organizes the initial codes into groups without yet deciding final themes. The method consolidates the code lists and then asks for grouping into thematic categories. A key constraint is completeness: every code must belong somewhere, even if that means creating additional groups or subgroups. The output is meant to function like a table of contents of the dataset.

What kinds of code groupings appear in the example, and what do they help the researcher do?

The regrouping produces categories such as leadership strategies and philosophy, student-related observation and conditional arrival, engagement and parent/community engagement, and culture-change process codes. These groupings help the researcher quickly inventory what’s present in the data and then decide which themes best answer the research questions.

Review Questions

When would it be risky to ask ChatGPT for themes directly, and what requirement in this workflow prevents that risk?
How does the two-column Excel output support an audit trail from transcript to code?
During focus coding, what does “every code must belong somewhere” practically force the researcher to do?

Key Points

1
Use ChatGPT to generate initial codes first, not themes, to preserve an audit trail from evidence to interpretation.
2
Prompt for “initial codes” as detailed, descriptive coverage of nearly every statement or sentence, explicitly blocking theme speculation.
3
Leverage ChatGPT’s ability to create an Excel file that maps interview text to initial codes in a two-column table.
4
Repeat the initial-coding prompt across all interview transcripts, then consolidate the resulting code lists for the next stage.
5
Apply focus coding by grouping all initial codes into thematic categories, ensuring every code is assigned somewhere.
6
Develop final themes after code organization, using the research questions to guide interpretation and maintain researcher control.

Highlights

The workflow treats themes without coding as credibility-free: without an audit trail, AI-generated themes are effectively unusable.

ChatGPT’s Excel output creates a direct mapping from transcript excerpts to detailed initial codes, making later review and consolidation easier.

Focus coding is framed as building a “table of contents” for the dataset—organizing codes into groups before any theme decisions.

A strict prompt requirement—every code must belong somewhere—prevents orphaned or missing evidence during code organization.

Topics

Thematic Analysis
Initial Coding
Focus Coding
Excel Workflow
Qualitative Data