How to Really do Thematic Analysis? Easy Step-by-Step Guide

TL;DR

Qualitative data analysis distills subjective text into themes that answer the research question, rather than using statistical testing.

Briefing Cornell Notes

Briefing

Qualitative data analysis is about turning a mass of subjective accounts into a credible, structured set of themes that directly answer the research question—without relying on statistics. Because qualitative material isn’t numerical, it can’t be evaluated through measurement or hypothesis testing in the usual quantitative sense. Instead, the central task is to distill people’s experiences and opinions from large volumes of text (interviews, focus groups, open-ended responses, or written documents) into a systematic summary that readers can understand and trust.

The end product is typically a framework of themes—major recurring patterns in the data that are relevant to the research question—often accompanied by sub-themes. Those themes function as the “answers” to the study: the analysis reduces the data’s volume while preserving meaning. The credibility of those findings depends on doing the reduction systematically rather than relying on memory, intuition, or after-the-fact impressions.

The process begins with coding, which means assigning short labels or condensed descriptions to meaningful units in the data. These units can be a sentence, part of a sentence, or a paragraph, depending on what captures the idea being expressed. Whether working manually or with software, the workflow is the same in principle: read through everything and tag what matters. The codes act like compact summaries of what participants (or authors of documents) are saying. As coding accumulates, patterns start to emerge—such as repeated references to negative experiences at work—because multiple codes point toward the same underlying topic.

Once coding is complete, the next stage is to make sense of the growing pile of codes by consolidating them into a single organized space. In practice, this often involves transferring codes into one document or container so they can be reviewed together. This stage is frequently described using terms like “focused coding” or “axial coding,” but the key point is less about terminology and more about purpose: group related codes into common categories using practical, “common sense” decisions.

During this consolidation, the analysis becomes both clearer and more efficient. Codes that are similar—perhaps created from different interviews but describing the same idea—get merged. For example, separate codes about “self-confidence,” “she was very self-confident,” and “self-confidence helped her a lot” can be combined into one code group. The result is a “table of contents” for the dataset: a detailed map of what topics appear across the whole body of evidence.

Finally, themes are built by returning to the research questions and asking how to answer them using the organized groups. At this point, answers often become surprisingly clear because the table of contents reflects the dataset comprehensively, including both relevant and less relevant material. The last step is to communicate the themes and sub-themes, then trace each claim back to evidence—typically by citing the original quotes or text segments that support the theme. The analysis thus moves from coding everything, to organizing and reducing, to presenting research-question-driven themes backed by traceable data.

Cornell Notes

Qualitative analysis turns non-numerical accounts (interviews, focus groups, open-ended responses, documents) into a credible set of themes that answer the research question. Since results can’t be tested with statistics, the work focuses on distilling and reducing large volumes of subjective text into recurring patterns. The process starts with coding: labeling meaningful units (sentences, parts of sentences, paragraphs) with short summaries. Next comes focused coding (often called axial coding): consolidating many codes into grouped categories, reducing redundancy, and building a “table of contents” of the dataset. The final step uses that organized map to form themes and sub-themes, then supports each theme by tracing claims back to original quotes.

Why can’t qualitative findings be handled like quantitative results, and what replaces statistics?

Qualitative data isn’t numerical or statistical; it expresses opinions, experiences, or perspectives. Because of that, the analysis can’t rely on measurement, performance testing, or statistical inference. Instead, credibility comes from systematic distillation: reducing the data into themes that directly answer the research question, then backing those themes with traceable evidence (e.g., quotes from the original text).

What exactly counts as “coding,” and how granular should codes be?

Coding means assigning short labels or condensed descriptions to meaningful units in the text. Those units can be a sentence, part of a sentence, or a paragraph—whatever captures a distinct idea. The code is essentially a low-summary of what that segment is saying, and it becomes the building block for later grouping and theme development.

What changes after coding when moving into focused coding (axial coding)?

After coding, the dataset often looks like a long list of codes spread across many transcripts. Focused coding consolidates those codes into one organized place and groups related codes together. Similar codes from different participants get merged (e.g., multiple versions of “self-confidence” become one group). This reduces the number of codes and produces a structured “table of contents” of topics across the whole dataset.

How does the “table of contents” idea help produce final themes?

The grouped categories show what topics exist in the data, across all transcripts, not just in isolated cases. When the researcher compares those groups to the research questions, the themes usually become clearer: major topics become themes, and the specific items within those topics become sub-themes. The organized map also helps avoid relying on hunches.

How should themes be presented so they remain credible?

Themes should be communicated as major recurring topics relevant to the research question, typically with sub-themes underneath. Each theme claim should be supported by evidence traced back to the original source—most often by using quotes from the dataset that demonstrate why that theme belongs in the findings.

Review Questions

What is the purpose of qualitative data analysis, and how does it differ from quantitative analysis?
Describe the three major stages: coding, focused coding (axial coding), and theme development. What is the output of each stage?
How do researchers ensure credibility when presenting themes from qualitative data?

Key Points

1
Qualitative data analysis distills subjective text into themes that answer the research question, rather than using statistical testing.
2
Coding starts by labeling meaningful units (sentence, part-sentence, or paragraph) with condensed summaries of what the text is saying.
3
After coding, consolidating codes into one organized container enables a holistic view of the dataset.
4
Focused coding groups similar codes into categories, reducing redundancy and creating a “table of contents” of dataset topics.
5
Themes are formed by mapping the grouped categories back to the research questions and selecting the major recurring topics.
6
Credible theme claims require traceable evidence, typically by citing original quotes from the dataset.

Highlights

Themes are the final “answers” in qualitative analysis: major recurring patterns relevant to the research question, usually paired with sub-themes.

Coding turns raw text into manageable building blocks by attaching short labels to meaningful units of meaning.

Focused coding merges overlapping codes and produces a structured table of contents that makes theme selection clearer.

Credibility depends on tracing each theme back to original quotes, not on intuition after reading the data.