Transcribing Oral Text

The topic of this article is transcribing oral interviews, that is, the process of translating spoken language with all its paralinguistic and extralinguistic features into some manner of written text presentation. More specifically, my discussion will focus on two related issues: transcription methods and challenges in displaying the researcher’s words in interviews when presenting excerpts to the public. In the following pages, I will first discuss the theoretical, methodological and practical issues we confront when using oral data from interviews in our research. I will then present ethnopoetics as what I feel is a useful solution to some of these issues. And I will stress the problem inherent in non-disclosure, that is, how so many writers of qualitative research fail to disclose to their readers what they have actually done to the oral data they are working with and citing.

I have worked with different types of interview data for most of my career as a qualitative researcher and cultural historian. I have come to view the transcribing process as both very rewarding and highly problematic – not to mention as a deeply personal issue: deeply personal because it tests and reveals my intentions and hopes as a cultural historian and as interviewer; rewarding, because in my experience most of my more inspired analytical work happens as I sit down to listen to the interviews I have done (which is the reason I have never been able to outsource the transcribing work to others); highly problematic, as ethical and practical questions always appear when attempting to put living, breathing people’s words from natural speech into that dry, lifeless print on paper.

There is always something meaningful lost in the process of transcribing the audio from an interview. It is a deeply interpretive process, and the researcher-transcriber is responsible for every choice (Ochs 1979). Transcription is part of our data construction and our analysis: Cécile Vigouroux and others even equals ethnographic fieldwork with transcription and vice-versa (Vigouroux 2007: 65; Fabian 1991).

To mention just a few of the more common questions to arise in the process of figuring out what the interviewee really says: Is this “hm” or “uh” or moment of chuckle substantial enough for a word to be included in the transcription, or are they conversational fillers, to be deemed meaningful or else to be excluded? Should conversational fillers be excluded at all? Should tone of voice be presented in the transcript or not? Should dialect and sociolect be part of the final text? Should big chunks of interview text be included in the presentation that are to be put before an audience – the final paper or book or exhibition – or just extracts and snippets?

There are of course at least two stages in the transcribing process. The first: a listening, sorting, remembering and analysing process of getting the sound files into a form more easily overviewed. The second: a what-to-present-in-public process in which most of the transcribed text is left out. While the first one exists primarily for the interviewer/researcher herself, perhaps to her colleagues in a research project, and the other one exists for the audience, there are strong connections between the two. In both stages of this research process the balance between keeping to what was said and making the text intelligible is of essence.

Personally, for each new project, I have chosen a different strategy regarding whether to transcribe audio files from research interviews completely, partly or not at all. Such choices depend on a range of considerations – from methodological, analytical and ethical considerations to more pragmatic questions of time and money (and in my case a particularly stubborn variety of wrist tendonitis). For each new project, I have similarly chosen different strategies regarding the final presentation of the interviews to the public: as normalized standard Norwegian, with almost symbolic inserts of dialect words to get a more authentic feel to the text, as ethnopoetic text inspired by Dell Hymes, Barbro Klein or others, or as transcriptions with different, more or less home-made systems.

The texts we work with as qualitative researchers are not ours; they are given to us on loan from the generous people that have agreed to cocreate them with us. We borrow the words we analyse from living, breathing human beings for whom we are supposed to, and normally do, have the greatest respect. These facts in and of themselves should be good enough reasons for thinking our transcription practices through, thoroughly. The texts are important both as expressions of the inner lives, memories and knowledge of our informants, and as being our main units of analysis. Sometimes, and for some of us, plain text is enough to textually represent the citations we use. At other times, for others of us – especially those working within narratology, discourse analysis or oral history – both the stories in themselves and the way the stories are being told are important to safeguard and communicate.

Writing Speech: In Practice

There are multiple considerations to take as to how to present speech in written form. One is the wish to honour the voice of the individual. That wish may lead us to reiterate each word meticulously as said, for instance when working with artists and storytellers. At other occasions, honouring the voice of an individual may mean to “clean up” the language of an interview, and to pretend that what the person said was closer to standard written form than it really was.1 Sometimes our duty to treat our interviewees with respect collides with our duty to present our citations and examples from the interviews as accurately as possible. However, most considerations actually have to do with the question of what the final text is presented for. That is, in case of linguistic issues of the type that has to do with sounds and tonality, there is not much choice but to transcribe the words of the informants into some kind of phonetic alphabet. Many feel such transcription to be the most truthful way to present speech in written form. Unfortunately, phonetic script is both time-consuming to write and utterly useless when displaying our interview transcripts to a wider audience, as practically no-one except specialists really know how to read it; ɪt lʊks laɪk ðɪs.2

Luckily for cultural historians, sociologists and most other human and social scientists, we normally do not discuss the finer points of linguistics in our analyses. Hence, we do not need specialized symbols to present our informants’ words in writing. What we need to present is the content and meaning of the words, in an intelligible fashion, but preferably as close to what was said as possible – whatever that may mean. Closeness to the oral text may often lead to us twisting and turning a bit on such things as spelling standards or conventions of sentence structure, while still sticking to preferencing making it readable for the wider audience. Our main purpose of presenting in our final publications what was said in the interviews is simply to show our readers that our analysis is sound, in addition to giving our readers a sense of being present in the interview situation. This may of course be done through standard prose text, but most ethnographers will encounter many instances of orality not fitting into standard textuality, not being able to confine the informant’s words into common writing.

Depending on the language in question, there will always be some sort of distance between oral speech and the standard lettered version of the same language. One aspect is the limited ability of our letters to capture the sounds of oral speech. Another is the difficulties of capturing the rhythm or flow of discourse – an aspect that often conveys meaning and reveals the speaker’s intentions.

Many ways of presenting orality in text have developed outside of the field of linguistics, especially since audiotaped data became common in the humanities and social sciences (Lapadat & Lindsay 1999: 64). Ethnomethodology and its offspring conversational analysis (CA) and discourse analysis (DA), with all their different relatives, have all developed formal or conventional traditions of how transcription is to be carried forth and displayed. Especially within CA, the Jefferson transcription notation system is common (Jefferson 2004), or some modification of it. The Jefferson system notes aspects such as stress and VOLUME, but also lengthening of sound::: and details of the lengths of pauses (0.8). One of the biggest surprises for me as a humanities scholar entering the sociology-dominated field of welfare research was that this rather technical system seemed to be so prevalent within the social sciences (see, e.g., Bischoping & Gazso 2016). Often, the finer points of intonation and exact lengths of pauses will have little influence on the actual analysis of the interview data. If the point of analysis is what was said, not how, prose-like transcriptions would be more than enough. At the very least, it would seem that a much simpler transcription system would have done the job.


Ethnopoetics originated as a means of valuing, interpreting and representing the oral arts of non-literary cultures. Coined and developed by scholars working with Native American traditions, ethnopoetics became a way to present in translated, written form the performances of living orality (Tedlock 1983). Later, ethnopoetic theory and method were used in the labour to understand oral poetry that had only been preserved in written form by earlier ethnographers (Quick 1999: 97). An example of that is when Dell Hymes revisited Franz Boas’ notes of the Chinook Sun’s Myth. Hymes (at first calling the method anthropological philology) saw art forms in data that had earlier mainly been seen as a means for researchers to understand culture (Webster & Kroskrity 2013). Among other things, Hymes stressed the need to understand what structures are grouped by the pauses (Quick 1999: 97) in verbal performance. Silences are an important part of all oral art, and the ethnopoetic pioneers put great weight into finding the pauses in the texts (i.e. performances, tape-recordings etc.), and to re-present the rhythm such pauses create in writing. Although ethnopoetic transcripts do not measure these pauses in tenths of a second (as the Jefferson system proposes), pauses and silences are clearly shown.

Ethnopoetic transcription often uses the standard fonts and spelling systems, but adds line shifts (often looking like stanzas) instead of prose paragraphs to capture the rhythm and flow of oral speech; see figure 1. This example is a Coyote narrative told by Samuel E. Kenoi in Chiricahua Apache to Harry Hoijer in the 1930s, edited by Anthony K. Webster in several articles (Webster 1999: 141–142; 2015: 7). The point of breaking down the continuous narrative into stanzas is to visualize the rhythm of speech. In this instance it clearly illuminates how the narrator uses the enclitic -ná’a (which translates in English to “so they say”) after each piece of quoted speech. In other words, the representation in print of this narrative as if it was what many will recognize as poetry, helps Webster in his analysis of the poetic aspects of traditional Apache narrative. However, ethnopoetic transcription has proved itself convenient also outside of its 1960s linguistic/anthropological roots and the effort to present and analyse non-Western oral art forms. For instance, inspired by the work of the American anthropologists, Swedish folklorist Ulf Palmenfelt tested the reach of ethnopoetics as method when applied to the archived field notes of Gotlandic ethnographer Per Arvid Säve (Palmenfelt 1994). Founder of the academic journal Oral Tradition, John Miles Foley revisited Homer, and referred to his work as “forensics of oral poetry” (Foley 2002: 97). Both Palmenfelt and Foley were in different ways taking on the endeavour to return or bring forth some of the performative qualities to the printed linear text.

Figure 1
Figure 1

Excerpt from Anthony K. Webster’s transcription of a Coyote narrative told by Samuel E. Kenoi in the 1930s (Webster 2015: 7).

Oral narrative arts and archival poetic texts are not the only ways ethnopoetics as method has proved useful. Leaving aside the aesthetic aspects of ethnopoetics, the approach has amongst other things established itself as helpful in research on everyday speech. An important contribution to that was for instance Barbro Klein’s work (Klein 1990, 2006). In several articles, she showed in detail what different transcription methods does to our analysis (Tolgensbakk 2020). Her use of ethnopoetics when studying the humorous anecdotes her father told in a family setting has been of inspiration to a generation of Scandinavian ethnologists:

G: Bila-Lena was going to treat us to cookies.

B: OH! Ha ha ha ha !

G: She took the plate that,

that was by the stove, from which the cat had eaten

and put two cookies on it.

B: OH! Ha ha ha ha !

E: Yeees.

“Helena, don’t you ever WASH YOUR DISHES?” they said to her,

they shouted, the old woman was deaf.

(falsetto:) “Helena, don’t you ever WASH YOUR DISHES?”

“Why should I?” she asked, “when one dish has contained only

porridge and the other only gruel. Food is food,” she said.

(Klein 2006: 88)

The ethnopoetic transcription method makes it possible to present some of the poetic, formal and performative elements so prevalent even in everyday orality, and to use these features analytically or simply to understand and to purvey what narrators are trying to tell us. Although never really standardized, it has become common to add such features as font size, capitalized letters, underlining, italics or bold to accentuate the most important variations in tone and emphasis. The objective is always to give the reader an honest impression of the speaker’s personal style and voice. The result is that some of the most important aspects of orality are visible both to the researcher and to the readers (Klein 1990: 52). As Tom O’Dell and Robert Willim remarks, such ethnopoetically transcribed paragraphs may be a challenge to read, and “almost beckons the reader to read aloud and almost ‘re-enact’ the original”. This way of presenting extracts from interviews may push “the written text back into the realm of orality and bodily performance” (O’Dell & Willim 2013: 318). The way ethnopoetically transcribed interviews are received by the reader may move these types of text towards the methodology of ethnodrama (Denzin 2001: 26; Mienczakowski 2000).

As far as I can gather, ethnopoetics has not caught on within the social sciences, neither as method of analysis nor presentation; Sociologists and others will often present their published informant quotes and citations in plain prose text, or plain prose text with some elements of tweaking standard written language. This is an excerpt from a great and recent article on tacit knowledge by sociologist Erin O’Connor, typically for her field presenting interview citations as semi-conventional prose:

As Kip practiced these movements, rather than applied the principles of viscosity, his gathers improved:

I didn’t have the glass skills then. I wasn’t used to moving the pipe so much. I was just so focused on getting it in and out that I could never make it work. But after watching this guy demonstrate for a while and trying to make it work … I was like, “Wow”, and I went from gathering a little bit to probably three times as much – that was probably my biggest learning curve. (Interview, July 20, 2006).

(O’Connor 2017: 220)

O’Connor does not focus much on how her interviewee spoke, but rather on what he said, and accordingly does not stray much from conventional prose in her presentation of the words of her interviewee. Some oral features are preserved in the excerpts, such as the interviewee quoting himself in the past, use of the quotative “like”, and some sense of repetition.

When moved to keep more of the orality of interview citation, CA conventions of presenting speech seem to be the prevalent method in both the social sciences and the humanities, whether following specific guidelines or by home-made systems adapted to the specific needs of the research in question.

To me, ethnopoetic forms of representing oral speech represent the perfect middle ground in showing our interview data. It is not as esoteric as phonetic transcriptions and can be read by non-professionals. The researcher does not need to learn a completely new skillset, and the method does not require many special signs or much in the way of a manual. Still, it keeps us and our readers much closer to what it was like to be present in the interview. CA transcription methods has its definite uses, especially within certain types of sociolinguistic or sociolinguistic-inspired research. However, I feel ethnopoetics should have a more prominent place as an easier alternative for many social scientists. Where CA puts focus on details inside oral language, ethnopoetics moves focus to rhythm, to emphasis and to meaning.


When Christina Davidson did a review of three decades (1979–2009) of literature on transcription in qualitative research, she found that although the literature about transcription “presents a substantial body of work”, there is still few empirical studies that move away from the “taken-for-granted approach to transcription” (Davidson 2009: 47). In an interesting article reviewing nursing research applying interview data, Sally Wellard and Lisa McKenna found highly variable attention to and disclosure of transcription methods (Wellard & McKenna 2001). It is obviously a weak spot of our disciplines that we seem to care so little about the treatment of our raw data, nor of disclosing our methods of treating the data to our readers (Poland 1995).

Surprising as it may seem to humanities scholars, for many social science research projects, any real closeness to the original interview is actually impossible, as the original sound recordings are unavailable (see, e.g., Myers & Lampropoulou 2016: 79). Even when the audio recording is available, there may be many steps on the way between the voices of the participants present in the interview situation and the analyst. There may be different researchers and assistants doing the interviewing, the transcribing and the translating before the researcher starts treating the textual representations as data. What this means for research has been little investigated.

Furthermore, although researchers such as Ian Hutchby and Robin Wooffitt underline that in CA, the researcher’s data is the audio, not the transcript (Hutchby & Wooffitt 1998), there is no doubt that in practice, individual researchers more often than not rely on transcripts for their analysis (Hammersley 2003: see note 20 page 775). This may be the transcript she herself has made, or archive data, or the work of other researchers and assistants within a project. This follows naturally from the all-to-common approach to transcription O’Dell and Willim calls “realist” or “instrumental”: transcription as a way to get the audio down on paper “correctly” (O’Dell & Willim 2013: 317). With such an approach, there is no surprise that most researchers do not go into any level of detail when – if at all – they tell their readers what they have done with the interview data they base their analysis on. It seems timely to quote Norwegian psychologist Steinar Kvale here, who goes as far as shouting “beware of transcripts!”, claiming that “the interview researcher’s road to hell becomes paved with transcripts” (Kvale 1996: 280). Much more could be said about the uses and abuses of transcription methods in the humanities and social sciences – and the lack of reflexion and disclosure of this aspect of our methods in our disciplines. However, this article will now move on to a related but somewhat different topic within transcription: the place of the words of the interviewer herself.

A greater attention to what happens in the research data’s journey from fieldwork to published quotes and analyses inevitably leads to a greater attention to what the interviewer is doing in the interview. I will discuss this topic through the story of my own adventures in considering how to handle my audiotaped interview data, from my initial high-principled intentions, to the mundane decisions to be made in the name of readability. Importantly, I want to demonstrate that the practice of transcription may bring out surprising dilemmas both of analysis and of publication – and that the two aspects or stages of transcription may be closer related to each other than we normally consider.

The Words and Worlds of the Researcher

For my Ph.D. thesis in the field of Cultural History,3 I needed to present parts of interviews I had done with young Swedish migrants living in Norway. The first problem was, as usual, how to be true to the interviewees’ own language, that is, their personal style and manner of speech. This problem was bigger than normal as the young Oslo Swedes often mix their language with Norwegian words and phrases. In short, they practice a pragmatic, often improvised and highly personal “Swegian” (“svorsk”), a mixture between the neighbouring languages (Tolgensbakk 2015). The mixing is quite complex. My informants would often start out speaking one of the languages, and then change during the course of the interview into the other language depending on how much they thought I was able to understand. They could be speaking standard Norwegian before switching over into Swedish while speaking of their Swedish parents, or in quoting Swedish friends. They would code-switch abruptly within sentences without any obvious reason, and even alternate between pronouncing any given Nordic word in a Swedish or a Norwegian tone. For instance, the young Swedish informants would pronounce the word “yes” in a Swedish way in the beginning of an interview section, and end by pronouncing the same word in a Norwegian way. To complicate matters further, both written languages present the word as “ja” in writing. Representing such oral creativity in a coherent manner in my final text seemed impossible, and I spent months pondering a solution. In the end, the problem felt banal, and in the final thesis, the young Swedes’ words were transcribed by me and then changed into more formal Swedish spelling by a native Swedish-speaking interpreter. Luckily, not only are grammar and words similar in Swedish and Norwegian, but also to a large degree tone and rhythm. This meant that presenting the interview excerpts in an ethnopoetic manner – with each verse or line representing one unit of speech, from pause to pause – posed no difficulties. Whether the interviewee spoke Norwegian, Swedish or Svorsk, each line or stanza would naturally be presented in a translated version with the words in more or less the same order.

The transcriptions of my interview presentations are very simple. I long struggled over the question of whether or not to introduce aspects such as LOUDNESS, whispers or stress, but the excerpts I ended up including in the final text did not contain many such features. Actually, the only nonverbal or semi-nonverbal features included are certain interjections, exclamations – and laughter, the latter a very important part of my often fun interviews with the young Swedes. In this choice, I align with most social scientists – laughter being the paralinguistic feature most often presented in such research (Myers & Lampropoulou 2016), perhaps because it is one of the nonverbal communicative modes most often perceived to change the meaning of verbalization. The next problem felt bigger, although all too familiar: what to do with my own words?

I came into this field of research among other things because I believe firmly in preserving the history of the everyday, which in practice often means recording, archiving, analysing and publishing all kinds of interviews. As an ethnographer I quite literally make a living out of analysing and publishing the personal and private words and worlds of strangers. However, I am not particularly fond of showing my own personal and private wor(l)ds. I have noticed that even though out of sheer principle I stand firmly in the The-researcher-has-to-endure-what-her-informants-endure-camp, I tend to make all kinds of excuses to remove myself from the final presented text: The reader will be bored. My question was stupid/makes no sense/will only complicate the discussion.

It is obvious that the participation of the interviewer is an inherent part of any interview. The perceived status of the interviewer has an impact on the social setting of the interview, and of what is said throughout the encounter. The interview style – structured or open, formal or informal – as well as the setting, mood and rapport between the participants, does influence interviewee answers and openness. How questions are framed affect the answers in all sorts of ways. As Mary Bucholtz puts it, a responsible transcription practice requires “the transcriber’s cognizance of her or his own role in the creation of the text and the ideological implications of the resultant product” (Bucholtz 1999: 1440). Everybody present is a part of the conversation.4 Sometimes this is more obvious, and some lovely examples may be found when cultural historians interview friends and family, as in Barbro Klein’s conversation with her father and her aunt (Klein 2006), or Claire Schmidt’s interviews with her friends and family:

Working with inmates can be extremely challenging, particularly for new employees. Inmates often target new employees for harassment in an effort not only to relieve boredom, but also to ascertain whether the individual can be manipulated to elicit a reaction. TI told me about being harassed when he was a new CO:

Schmidt: Did you get hazed?

TI: Oh, yeah. Oh, definitely.

Schmidt: Like what?

TI: Definitely, like, by the inmates. When I worked in Segregation, you know, the inmates that are in Segregation, a lot of’em spent like, years, there, because that’s where they’re comfortable. […]

(Schmidt 2013: 80)

Both these researchers have chosen to present their own voice as part of the description of the interview – Klein in ethnopoetic form while discussing different types of transcription, Schmidt in prose, but a prose that includes certain aspects of oral language.

Another obvious example of when an interview involves more than the interviewee, is when the interview exchange develops into some sort of call-and-response, as in the repeated “no” in this interview from a project investigating youth unemployment:5

Q: so what you are saying now

      is that

      that counter pressure

      it came from your mother

A: yes that’s where the pressure was from


      not from any of those government applications

Q: no

A: no

Q: no

A: it was that demand

      to send in those cards […]6

The reality is that the researcher’s questions and her various modes of verbal or semiverbal continuers and assessments in the backchannel of the conversation often make for tedious reading. Who cares, except linguists, how many times the “no” was repeated? The audience will not normally be very interested in the backchannelling of an interview, or how the interviewer framed and put forth her query (especially if it was long-winded). What both author and reader of a social or human science text want is to get closer to what the research in question is all about – that is, the interviewee.

Trying and Failing to be Open about my Questions

Nonetheless, I do believe that we should be far more open about how we frame our questioning, and the degree to which we are interacting with our informants in the course of our interviews. In my project on the young Swedes, I decided to overcome my fear of embarrassing myself by showing my own interaction with the informants in the interview setting. I went far longer in fulfilling this ideal in the finished text than I ever had before. Or, that was what I thought. Revisiting my final text, I see that I generally chose to paraphrase my questions, still avoiding showing my confused and often stuttering questions. A typical example is this (translated from the Norwegian and Swedish text of my final thesis):

Lena7 mentioned in her interview with me that she and her closest friend sometimes would have what she referred to as “Hating Norway-days”. I asked her to elaborate:

eh it is like this

in winter when it’s such a day

you just go nuts seeing people skiing

on Karl Johan8

(Tolgensbakk 2015: 77)

Although not completely satisfied with it and feeling a little bit like a coward, I believe this solution has the benefits of keeping the interviewer’s questions visible without jading the reader too much. A non-paraphrasing version of the question in this excerpt would look like this:

well one of the things I love about interviewing young Swedes


you said


sometimes you don’t

have no patience

you said Hating Norway days?

This last version could perhaps be considered a more truthful rendering of the actual conversation. But it would also take up more space in the final publicized text, and probably distract from the analysis – in this case a discussion of national stereotypes as migrants’ ways of coping.

With the solution I present in the first example, the premises on which the conversations quoted rely are transparent. I did not feel as though I lied to my readers. The introduction to what I cite is to a certain degree explained, and I show the context to what Lena says. I consider myself a decent fieldworker and at least sometimes a great interviewer, and like many other interviewers I get my interviewees to talk by being naïve, open and playing the novice when asking my questions. The downside of that is, of course, that my questions may seem rather silly when written down. I, as most interviewers, do not ask questions and then leave the floor open for the interviewee to do her show solo. I have long and often vague introductions to questions, I enter into the dialogue with my own anecdotes, and I laugh at the wrong places as well as the right ones.

Meaningful Backchannels

As a rule, the interviewer not only interrupts the conversation with follow-ups and expanding queries, but also with minimal responses or feedbacks such as a variety of “mhm” and “yeah” (Bennett & Jarvis 1991). Although seemingly trivial or unsubstantial, these linguistic features have been shown to have definite meanings and important functions in a conversation. Among other things, they are an important part in judging a listener as “good” or “bad”, and they play a part in determining turn-taking, delineating in complex ways what Michael McCarthy calls “the boundary between back channel behavior and floor-grabbing” in a conversation (McCarthy 2003: 43). In an interview, these “fillers”, by their presence or their absence, help the interviewee understand that the interviewer is still listening and still interested in what she is saying (or, conversely, they help her understand when the interviewer is fed-up, or eager to ask another question, or to change topics). I do believe that had more researchers included their own voice in interview settings when presenting interviewer excerpts in writing, it would be much clearer how much the interview and the conversation as genres are closely related. There are of course definite boundaries between the two, at least formally (Silvén-Garnert 1991). An interview is a conversation set up through prior agreement that has pre-defined roles regarding who is going to do the asking and who is going to do the answering. But in reality, human communication is rarely one-way, and even one-way communication needs encouragement through feedback to be sustained. Even the most self-assured and talkative interviewee needs some kind of confirmation that the person interviewing her understands what she is saying and wants her to carry on. If we include such interruptions to the storyteller’s flow in our published texts, we will not only be more honest about our work – how we actually get our informants to speak to us – we will also probably be able to add new layers to our analysis, and to enhance our readers’ understanding. The reflexivity movement in qualitative research has stressed that we as researchers need to discuss our positionality vis-à-vis those we study, and the different layers of knowledge and power hierarchies inherent in the interview: how it shapes the interview itself, and how it enables or hinders scholarly insight (Beaunae, Wu & Koro 2011; Denzin 2001). Stating my theoretical approach and describing my background, intentions etc. would be relatively easy. Showing my presence in the interviews was a trickier task.

Having established my desire to present my own feedback and interruptions as an interviewer in my printed transcript excerpts, I still wanted to maintain as much readability as possible. If I were to include everything said in the course of the interview sections presented in the thesis, sometimes every second line in the ethnopoetic rendering would be my words. My solution was to include my own voice, but placing it on the right-hand side of the written text. The result can be seen in the following example from my interview with Rinaldo, where we discuss employment contracts:

and they just “yeah it’s for the summer first and then we’ll see”

and then I stayed on for three years as a temp

so I never got a permanent position

I didn’t want one either                                                      no

                                                                                  why not?

I play music as well

and this way I can always get some time off

it is really comfortable to just like

“how much do you want to work next week?”

“I don’t want to work at all”           there’s freedom in that



(Tolgensbakk 2015: 203)

My feeling is that this makes the interviewer more visible without disturbing reading flow too much. In this written presentation of an excerpt from the interview, I chose to place my first minimal response (“no”) on the same line as Rinaldo. This is meant to signal that I was speaking at the same time as him. My next interruption of his flow comes as the question “why not?”, and then through a sort of interpretive inquiry “there’s freedom in that” (again spoken while he is still speaking), to which he answers confirmatively: “yeah, freedom”.

The excerpt from the Rinaldo interview shows that I am very much present in the interview even when it is formally Rinaldo’s turn to speak. I encourage his voice through agreeing with him in the backchannel (“no”), and I urge him to elaborate through a small question “why not?”). Importantly, I try to interpret his words practically as he is speaking them (“there’s freedom in that”). Rinaldo repeats my word “freedom”, seemingly agreeing with my interpretation. However, this is obviously a crucial point for any analysis of this excerpt, as it will be impossible to know after the fact whether Rinaldo felt the choice of the word “freedom” to be an apt reading of his relations to the labour market. He may have been agreeing with me to be polite, or simply to move the conversation on. Rinaldo returned to the concept of freedom several times later in the interview. Treating myself kindly, I might interpret that as a sign that my words resonated with him, and that I used the concept of freedom to unpack what having the choice to take on jobs or not meant for him. With Donna Haraway, I may claim that this excerpt of the interview shows how I “articulated with” my interviewee, and is an example of how all research is co-creating (Haraway 1988). In more critical terms, I could argue that I imposed my own interpretation on Rinaldo’s life, changing not only the very nature of the interview but also Rinaldo’s view of himself and his precarious labour conditions. My point is that although this is a particularly glaring example of the way interviewers shape the interview, this is merely a version of what happens in all sorts of participatory fieldwork. Accepting that it happens, and being honest about it, will make research more transparent, and better. It should not remain a hidden part of research dissemination, relegated to obscure methods discussions.

The Benefits of Ethnopoetics to the Research Interview

To conclude, I do not argue that all research interviews ought to be transcribed meticulously into ethnopoetic form. For one thing, it would probably be too big a workload for most research projects, and not necessarily bring much of a payoff. I do not even believe that all presentations or citations of the words of our informants should be rendered in the final publication in this way. However, I do insist that it is a weakness of our disciplines – across large parts of the humanities and social sciences – that we do not have a habit of disclosing our transcription methods to our readers, whether those readers be lay or scholarly. In many – perhaps most – instances a quote laid out in standard prose may be more than enough, both for us as researchers to work with, and for our readers to grasp the words of our informants. We do nonetheless owe it to our informants, our readers and ourselves to have a conscious relationship to what we do to the information we gain in the field. A respectful treatment of our informants should encompass an honest and good rendering of their words.

I would also argue that there is substantive cause for us to be more honest about our own questioning, to show more of ourselves in the final published interview excerpts and quotations. Ethnopoetics makes that labour a little less burdensome, and my own technique of placing the words of the interviewer on the right-hand side makes such excerpts a little more readable. There are of course downsides to this approach as well. A call to put the interviewer on the right-hand side presupposes only two participants in the interview. It may be possible to use in the case of one interviewer and two interviewees but will probably be difficult to implement in group interviews, focus group interviewing and similar situations where the conversation moves in more complex ways. Journal editors and print standards and conventions are other obstacles that might stand in the way of an ethnopoetic standard of including the interviewer on the side of the page.

The benefits are that the reader gets as much of the truth as one could expect from a textual representation of oral conversation. Ethnopoetic transcription makes it possible to retain in print some of the rhythm, flow and poetic characteristics of the spoken word. Including the feedback of the interviewer makes us as researchers more honest, more vulnerable – and truer to our analytic craft. I hope this article will encourage more scholars to make use of ethnopetics as transcription method, and to disclose their own words and worlds as part of their craft to their readers. The minimum requirement ought to be that we explain what principles of transcription we have used, and how we have chosen the excerpts we leave in the final text for our readers to see.


  1. In Norway, the classic example of deriding someone through simply repeating what she said, is the 1993 Niels Christian Geelmuyden interview with then prime minister Gro Harlem Brundtland. In jest and protest against what he felt was an overzealous public relation team trying to control the interview, the male interviewer chose to reiterate the first female prime minister’s words “exactly as she said it” – and scandal ensued. Although journalism has other aspects to consider than ethnographers, the point stands that not all interviewees will feel good about seeing themselves presented without the protective cloak of standard written language (Geelmuyden 1998). [^]
  2. It looks like this. [^]
  3. Funded by the Norwegian Research Council. [^]
  4. Whether actually speaking or not, not to mention being physically present or not – see Bakhtin’s superaddressee (Morson & Emerson 1990: 135). [^]
  5. NEGOTIATE – Overcoming early job-insecurity in Europe, funded by EU H2020 under grant agreement grant No. 649395. [^]
  6. Unpublished excerpt from NEGOTIATE transcriptions. [^]
  7. All names of interviewees are pseudonymized. [^]
  8. Karl Johan being the main parade street of Oslo. [^]


Beaunae, Cathrine, Chiu-Hui Wu & Mirka Koro 2011: Exploring Performativity and Resistance in Qualitative Research Interviews: A Play in Four Acts. Qualitative Inquiry 17(5): 412–421. DOI:

Bennett, Mark & Jan Jarvis 1991: The Communicative Function of Minimal Responses in Everyday Conversation. The Journal of Social Psychology 131(4): 519–523. DOI:

Bischoping, Katherine & Amber Gazso 2016: Analyzing Talk in the Social Sciences: Narrative, Conversation & Discourse Strategies. Thousand Oaks, CA: Sage Press.

Bucholtz, Mary 1999: The Politics of Transcription. Journal of Pragmatics 32(2000): 1439–1465. DOI:

Davidson, Christina 2009: Transcription: Imperatives for Qualitative Research. International Journal of Qualitative Methods 8(2): 36–52. DOI:

Denzin, Norman K. 2001: The Reflexive Interview and a Performative Social Science. Qualitative Research 1(1): 23–46. DOI:

Fabian, Johannes 1991: Ethnographic Objectivity: From Rigor to Vigor. Annals of Scholarship 8: 381–408.

Foley, John Miles 2002: How to Read an Oral Poem. Urbana & Chicago: University of Illinois Press.

Geelmuyden, Niels Christian 1998: Ærlighetens komedie: Portrettdrama i 28 akter. Oslo: Huitfeldt forlag.

Hammersley, Martyn 2003: Conversation Analysis and Discourse Analysis: Methods or Paradigms? Discourse and Society 14: 751–781. DOI:

Haraway, Donna 1988: Situated Knowledges: The Science Question in Feminism and the Privilege of Partial Perspective. Feminist Studies 14(3): 575–599. DOI:

Hutchby, Ian & Robin Wooffitt 1998: Conversation Analysis: Principles, Practices and Applications. Cambridge: Polity Press.

Jefferson, Gail 2004: Glossary of Transcript Symbols with an Introduction. In: Gene H. Lerner (ed.), Conversation Analysis: Studies from the First Generation. Amsterdam: John Benjamins, 13–31. DOI:

Klein, Barbro 1990: Transkribering är en analytisk akt. RIG Kulturhistorisk tidskrift 71(2): 41–66.

Klein, Barbro 2006: An Afternoon’s Conversation at Elsa’s. In: Annikki Kaivola-Bregenhøj, Barbro Klein & Ulf Palmenfelt (eds.), Narrating, Doing, Experiencing: Nordic Folkloristic Perspectives 16: 79–100. Helsinki: Studia Fennica Folkloristica.

Kvale, Steinar 1996: The 1,000-Page Question. Qualitative Inquiry 2(3): 275–284. DOI:

Lapadat, Judith C. & Anne C. Lindsay 1999: Transcription in Research and Practice: From Standardization of Technique to Interpretive Positionings. Qualitative Inquiry 5(1): 64–86. DOI:

McCarthy, Michael 2003: Talking back: “Small” Interactional Response Tokens in Everyday Conversation. Research on Language in Social Interaction 36(1): 33–63. DOI:

Mienczakowski, Jim 2000: Ethnodrama: Performed Research: Limitations and Potential. In: Paul Atkinson, Sara Delamont, Amanda Coffey, John Lofland & Lyn Lofland (eds.), Handbook of Ethnography. London: Sage.

Morson, Gary Saul & Caryl Emerson 1990: Mikhail Bakhtin: Creation of a Prosaics. Stanford: Stanford University Press.

Myers, Greg & Sofia Lampropoulou 2016: Laughter, Non-seriousness and Transitions in Social Research Interview Transcripts. Qualitative Research 16(1): 78–94. DOI:

Ochs, Elinor 1979: Transcription as Theory. In: Elinor Ochs & Bambi Schieffelin (eds.), Developmental Pragmatics. New York: Academic, 43–72.

O’Connor, Erin 2017: Touching Tacit Knowledge: Handwork as Ethnographic Method in a Glassblowing Studio. Qualitative Research 17(2): 217–230. DOI:

O’Dell, Tom & Robert Willim 2013: Transcription and the Senses. The Senses and Society 8(3): 314–334. DOI:

Palmenfelt, Ulf 1994: Per Arvid Säves möten med människor och sägner. Stockholm: Carlsson Bokförlag.

Poland, Blake D. 1995: Transcription Quality as an Aspect of Rigor in Qualitative Research. Qualitative Inquiry 1(3): 290–310. DOI:

Quick, Catherine S. 1999: Ethnopoetics. Folklore Forum 30: 95–105.

Schmidt, Claire 2013: “If you don’t laugh you’ll cry”: The Occupational Humor of White American Prison Workers and Social Workers. Ph.D. Columbia: University of Missouri.

Silvén-Garnert, Eva 1991: Om feltarbete och verbal documentation. In: Eva Silvén-Garnert (ed.), Verbalt. Visuellt, Materiellt. Stockholm: Nordiska Museet/Samdok.

Tedlock, Dennis 1983: The Spoken Word and the Work of Interpretation. Philadelphia: University of Pennsylvania Press. DOI:

Tolgensbakk, Ida 2015: Partysvensker; GO HARD! En narratologisk studie av unge svenske arbeidsmigranters nærvær i Oslo [Partyswedes GO HARD! A narratological study of the presence of young Swedish labour migrants in Oslo]. Ph.D. Oslo: IKOS, University of Oslo.

Tolgensbakk, Ida 2020: “More or Less Word for Word”: Barbro Klein and Transcription as Analytical Craft. Western Folklore 79(4-Fall): 453–468.

Vigouroux, Cécile B. 2007: Trans-scription as a Social Activity: An Ethnographic Approach. Ethnography 8(1): 61–97. DOI:

Webster, Anthony K. 1999: Sam Kenoi’s Coyote Stories: Poetics and Rhetoric in some Chiricahua Apache Narratives. American Indian Culture and Research Journal 23(1): 137–163. DOI:

Webster, Anthony K. 2015: Cultural Poetics (Ethnopoetics). Oxford On-Line Handbook: Linguistics. Oxford: Oxford University Press. DOI:

Webster, Anthony K. & Paul V. Kroskrity 2013: Introducing Ethnopoetics: Hymes’s Legacy. Journal of Folklore Research 50(1–3): 1–11. DOI:

Wellard, Sally & Lisa McKenna 2001: Turning Tapes into Text: Issues Surrounding the Transcription of Interviews. Contemporary Nurse 11(2–3): 180–186. DOI:

Ida Tolgensbakk is a senior researcher at NOVA, Oslo Metropolitan University, Oslo, Norway. A folklorist and cultural historian, her work centres on migration history, food studies, critical whiteness studies and digital culture. Moving between the social sciences and the humanities, she has published on European migration and precarity as well as on döner kebab memes.