Choosing between long and short word forms in Mandarin

Lin Li, Kees van Deemter, Denis Paperno, Jingyu Fan

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Between 80% and 90% of all Chinese words have long and short form such as 老虎/虎 (laohu/hu , tiger) (Duanmu, 2013). Consequently, the choice between long and short forms is a key problem for lexical choice across NLP and NLG in Chinese. Following on from earlier work on abbreviations in English (Mahowald et al., 2013), we bring a probabilistic perspective to word length choice, using both a behavioural and a corpus-based approach. Thus, we hypothesise that, in Chinese, short forms are likelier in supportive than in neutral contexts. Our corpus and behavioral study supported this hypothesis, but a closer analysis revealed striking differences between different types of Chinese words.
Original languageEnglish
Title of host publicationProceedings of the 12th International Conference on Natural Language Generation
Place of PublicationTokyo, Japan
PublisherAssociation for Computational Linguistics
Pages34-39
Number of pages6
DOIs
Publication statusPublished - 1 Oct 2019

Fingerprint

Dive into the research topics of 'Choosing between long and short word forms in Mandarin'. Together they form a unique fingerprint.

Cite this