Semi-automatic discourse annotation in a low-resource language: Developing a connective lexicon for Nigerian Pidgin

M. Marchal, M. Scholman, V. Demberg

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Cross-linguistic research on discourse structure and coherence marking requires discourse-annotated corpora and connective lexicons in a large number of languages. However, the availability of such resources is limited, especially for languages for which linguistic resources are scarce in general, such as Nigerian Pidgin. In this study, we demonstrate how a semi-automatic approach can be used to source connectives and their relation senses and develop a discourse-annotated corpus in a low-resource language. Connectives and their relation senses were extracted from a parallel corpus combining automatic (PDTB end-to-end parser) and manual annotations. This resulted in Naija-Lex, a lexicon of discourse connectives in Nigerian Pidgin with English translations. The lexicon shows that the majority of Nigerian Pidgin connectives are borrowed from its English lexifier, but that there are also some connectives that are unique to Nigerian Pidgin.
Original languageEnglish
Title of host publication2nd Workshop on Computational Approaches to Discourse, CODI 2021 - Proceedings of the Workshop
PublisherAssociation for Computational Linguistics
Pages84-94
DOIs
Publication statusPublished - 2021
Externally publishedYes

Fingerprint

Dive into the research topics of 'Semi-automatic discourse annotation in a low-resource language: Developing a connective lexicon for Nigerian Pidgin'. Together they form a unique fingerprint.

Cite this