Using Open-Source Automatic Speech Recognition Tools for the Annotation of Dutch Infant-Directed Speech

Anika van der Klis*, Frans Adriaans*, Mengru Han, René Kager

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

There is a large interest in the annotation of speech addressed to infants. Infant-directed speech (IDS) has acoustic properties that might pose a challenge to automatic speech recognition (ASR) tools developed for adult-directed speech (ADS). While ASR tools could potentially speed up the annotation process, their effectiveness on this speech register is currently unknown. In this study, we assessed to what extent open-source ASR tools can successfully transcribe IDS. We used speech data from 21 Dutch mothers reading picture books containing target words to their 18- and 24-month-old children (IDS) and the experimenter (ADS). In Experiment 1, we examined how the ASR tool Kaldi-NL performs at annotating target words in IDS vs. ADS. We found that Kaldi-NL only found 55.8% of target words in IDS, while it annotated 66.8% correctly in ADS. In Experiment 2, we aimed to assess the difficulties in annotating IDS more broadly by transcribing all IDS utterances manually and comparing the word error rates (WERs) of two different ASR systems: Kaldi-NL and WhisperX. We found that WhisperX performs significantly better than Kaldi-NL. While there is much room for improvement, the results show that automatic transcriptions provide a promising starting point for researchers who have to transcribe a large amount of speech directed at infants.

Original languageEnglish
Article number68
JournalMultimodal Technologies and Interaction
Volume7
Issue number7
DOIs
Publication statusPublished - 3 Jul 2023

Bibliographical note

Publisher Copyright:
© 2023 by the authors.

Funding

This work is funded through the Gravitation program of the Dutch Ministry of Education, Culture, and Science and the Netherlands Organization for Scientific Research (NWO grant number 024.001.003) and by Utrecht University’s Human-centered Artificial Intelligence focus area (HAI Summer 2021 Small Grant).

FundersFunder number
Dutch Ministry of Education, Culture, and Science
Universiteit Utrecht
Nederlandse Organisatie voor Wetenschappelijk Onderzoek024.001.003

    Keywords

    • automatic speech recognition
    • infant-directed speech
    • research tools
    • speech registers
    • transcriptions

    Fingerprint

    Dive into the research topics of 'Using Open-Source Automatic Speech Recognition Tools for the Annotation of Dutch Infant-Directed Speech'. Together they form a unique fingerprint.

    Cite this