Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

Pin-Jie Lin*, Muhammed Saeed, Ernie Chang, Merel Scholman

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Developing effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages. Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements; and demonstrate that augmenting orthographic data and using task adaptive training with back-translation can have a significant impact on model performance.

Original languageEnglish
Title of host publicationProceedings of the 24th INTERSPEECH conference
Pages3954-3958
Number of pages5
Volume2023-August
DOIs
Publication statusPublished - Sept 2023

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Bibliographical note

Publisher Copyright:
© 2023 International Speech Communication Association. All rights reserved.

Funding

This work was supported by the Deutsche Forschungsgemein-schaft, Funder Id: http://dx.doi.org/10.13039/ 501100001659, Grant Number: SFB1102: Information Density and Linguistic Encoding.

FundersFunder number
Deutsche ForschungsgemeinschaftSFB1102

    Keywords

    • low-resource language
    • low-resource machine translation
    • spoken language understanding

    Fingerprint

    Dive into the research topics of 'Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin'. Together they form a unique fingerprint.

    Cite this