TY - GEN
T1 - Corpus Creation and Automatic Alignment of Historical Dutch Dialect Speech
AU - Bentum, Martijn
AU - Sanders, Eric
AU - van den Bosch, Antal
AU - Zeldenrust, Douwe
AU - van den Heuvel, Henk
N1 - Publisher Copyright:
© 2024 ELRA Language Resource Association: CC BY-NC 4.0.
PY - 2024/5
Y1 - 2024/5
N2 - The Dutch Dialect Database (also known as the 'Nederlandse Dialectenbank') contains dialectal variations of Dutch that were recorded all over the Netherlands in the second half of the twentieth century. A subset of these recordings of about 300 hours were enriched with manual orthographic transcriptions, using non-standard approximations of dialectal speech. In this paper we describe the creation of a corpus containing both the audio recordings and their corresponding transcriptions and focus on our method for aligning the recordings with the transcriptions and the metadata.
AB - The Dutch Dialect Database (also known as the 'Nederlandse Dialectenbank') contains dialectal variations of Dutch that were recorded all over the Netherlands in the second half of the twentieth century. A subset of these recordings of about 300 hours were enriched with manual orthographic transcriptions, using non-standard approximations of dialectal speech. In this paper we describe the creation of a corpus containing both the audio recordings and their corresponding transcriptions and focus on our method for aligning the recordings with the transcriptions and the metadata.
KW - corpus creation
KW - dialectal speech
KW - Dutch language variants
KW - speech transcriptions
UR - http://www.scopus.com/inward/record.url?scp=85195915811&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85195915811
SN - 9782493814104
T3 - 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
SP - 4021
EP - 4029
BT - 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings
A2 - Calzolari, Nicoletta
A2 - Kan, Min-Yen
A2 - Hoste, Veronique
A2 - Lenci, Alessandro
A2 - Sakti, Sakriani
A2 - Xue, Nianwen
PB - European Language Resources Association (ELRA)
T2 - Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024
Y2 - 20 May 2024 through 25 May 2024
ER -