How to transcribe interviews? (PART II - which approach to transcription to choose?)

TL;DR

Match transcription granularity to the study’s analytical level rather than adopting a default format.

Briefing Cornell Notes

Briefing

Choosing a transcription method isn’t a technical preference—it’s a research design decision. Two principles anchor the choice: the level of transcription should match the level of analysis, and what gets included in the transcript should be driven by the research question. Together, they frame transcription as a purposeful, selective process rather than a neutral conversion of speech into text.

Transcription is inherently selective and therefore never fully objective. Converting sound and video into text involves translation, and translation always changes something—two transcribers can produce meaningfully different outputs from the same interview. Because of that, transparency becomes essential: researchers should clearly justify why a particular transcription approach was chosen and how it aligns with later analysis. The transcript should be treated as a tool built for a specific analytical goal, not as a complete record of every vocal detail.

The practical question is what the analysis will actually do with the transcript. If the study will analyze interactional features—such as turn-taking, conversational structure, or fine-grained discourse—then a more detailed transcription is typically warranted. That level of detail may include pauses, stutters, filler words, and even the length of individual utterances. If, instead, the study focuses on what participants say—reporting experiences, views, or content—then extreme granularity may be unnecessary. The transcript can remain selective, capturing only those timing cues that matter for interpretation. For instance, a pause of several seconds might signal difficulty answering or uncertainty, making it analytically relevant; other hesitations might be omitted if they don’t serve the research aims.

The guidance also pushes against the idea that researchers must follow a single “correct” format. Supervisors may recommend more detailed conventions, but researchers can decline when the added complexity doesn’t serve the research question. A personal example illustrates this tension: during doctoral work, a more detailed approach (including pauses, ums, stutters, and the timing of micro-events) was suggested, but after reading, the researcher chose a less granular strategy aligned with the study’s aims.

The transcript’s intended use also affects member checking, a validity practice where participants review transcripts. A detailed transcript can create unexpected problems: one account described a teacher participant requesting changes because the transcript looked “chaotic” and “not fluent,” even though the content was accurate. The participant’s concern wasn’t about interpretive meaning but about how the transcript’s formatting reflected professionalism on paper.

Overall, the decision rule is straightforward: match transcription detail to analysis needs, justify inclusions and exclusions, and anticipate how transcription choices will interact with participant review. Selectivity is not a flaw—it’s the mechanism that keeps the transcript fit for purpose.

Cornell Notes

Transcription detail should be designed for the analysis, not treated as a neutral, objective record. Two guiding rules are emphasized: the level of transcription must complement the level of analysis, and inclusion decisions must follow the research question. Because transcription is a selective translation from speech/video to text, different transcribers can produce different outputs; researchers should therefore be transparent about their choices. Fine-grained conventions (pauses, stutters, filler words, timing) are most useful when analyzing interactional or conversational processes, while content-focused studies can often omit many vocal details. Transcription choices also affect member checking, since participants may react to how a transcript looks, not just what it says.

Why is transcription described as inherently selective rather than objective?

Turning speech and video into text requires translation, and translation always changes something. That means transcription involves many decisions—what to capture, how to represent it, and what to omit—so two transcribers can produce different versions from the same interview. The result is that “objectivity” is not realistic; transparency about choices is the practical substitute.

What two principles determine which transcription approach to use?

First, the level of transcription should complement the level of analysis. Second, what to include should be driven by the research question that the analysis aims to answer. These principles connect transcription format directly to later analytical goals, such as whether the study examines interactional features or focuses on participants’ content.

When does a more detailed transcription format become necessary?

More detail is typically needed for conversational analysis or interaction-focused work—such as examining turn-taking, conversational structure, or other micro-features of talk. In those cases, pauses, stutters, filler words, and even the length of utterances may matter because the analysis depends on how people speak, not only what they say.

When is selective transcription acceptable, and what might still be worth including?

Selective transcription is acceptable when the study is interested in content—participants’ views, experiences, or reported meaning—rather than the mechanics of speech. Researchers may still include certain pauses if they are analytically meaningful; for example, a long pause (like several seconds) could indicate difficulty answering or uncertainty, even if shorter hesitations are omitted.

How can transcription choices affect member checking?

Member checking can involve sending transcripts to participants for review. A detailed transcript may prompt requests for changes based on appearance: one example described a teacher asking for edits because the transcript looked chaotic and not fluent, which the participant felt reflected poorly on professionalism. The issue was tied to how the transcript’s formatting conveyed speech patterns on paper.

Review Questions

How would you justify including or excluding pauses, stutters, and filler words using the two guiding principles?
What analytical goals would push you toward a transcription that records timing and micro-features of speech?
In what ways could a highly detailed transcript complicate member checking, even if the content is accurate?

Key Points

1
Match transcription granularity to the study’s analytical level rather than adopting a default format.
2
Use the research question to decide what belongs in the transcript and what can be omitted.
3
Treat transcription as selective translation; different transcribers can legitimately produce different outputs.
4
Be transparent and explicit about transcription choices so readers understand the logic behind inclusions and exclusions.
5
Don’t feel obligated to follow a supervisor’s preferred transcription style if it doesn’t serve the research aims.
6
Include vocal timing details only when they are analytically relevant to the planned analysis.
7
Plan for member checking impacts: participants may respond to how transcripts look, not only what they contain.

Highlights

Transcription is translation from speech/video to text, so it cannot be fully objective and always involves selective decisions.

Two rules govern method choice: transcription level must complement analysis level, and inclusions must follow the research question.

A long pause can be analytically meaningful even in content-focused studies, while many other hesitations may be unnecessary.

Member checking can trigger formatting-driven revisions when detailed transcripts appear “chaotic” or “not fluent” to participants.

Topics

Transcription Approaches
Research Question Fit
Conversational Analysis
Member Checking
Selective Transcription