Abstract
Voice-recordings are increasingly implemented in web surveys, but the resulting audio data need to be transcribed before analysis. Since manual coding is too time- and work-intensive, researchers often rely on automatic speech recognition (ASR) systems for the transcription of the voice-recordings. However, ASR tools might create partly incorrect transcriptions anjavascript:void(0);d potentially change the content of responses. If the ASR performance (i.e., accuracy and validity) differs by subgroup and contextual factors, a bias is introduced in the analysis of open-ended questions. We assessed the impact of sociodemographic and contextual factors on the accuracy and validity of ASR transcriptions with data from the Longitudinal Internet Studies for the Social Sciences (LISS) panel collected in December 2020. We find that background noise reduces the accuracy and validity of ASR transcriptions. In addition, validity improved when the respondent was alone during the survey. Fortunately, we did not find any evidence of systematic differences across subgroups (age, sex, education), devices or respondent location.
| Original language | English |
|---|---|
| Pages (from-to) | 1-12 |
| Number of pages | 12 |
| Journal | Survey Practice |
| Volume | 17 |
| DOIs | |
| Publication status | Published - 29 Feb 2024 |
Keywords
- voice-recording
- automatic speech recognition
- validity
- accuracy
- web survey
- open-ended question
Fingerprint
Dive into the research topics of 'Keep the noise down: On the performance of automatic speech recognition of voice-recordings in web surveys'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver