Abstract
Between 80% and 90% of all Chinese words have long and short form such as 老虎/虎 (laohu/hu , tiger) (Duanmu, 2013). Consequently, the choice between long and short forms is a key problem for lexical choice across NLP and NLG in Chinese. Following on from earlier work on abbreviations in English (Mahowald et al., 2013), we bring a probabilistic perspective to word length choice, using both a behavioural and a corpus-based approach. Thus, we hypothesise that, in Chinese, short forms are likelier in supportive than in neutral contexts. Our corpus and behavioral study supported this hypothesis, but a closer analysis revealed striking differences between different types of Chinese words.
Original language | English |
---|---|
Title of host publication | Proceedings of the 12th International Conference on Natural Language Generation |
Publisher | Association for Computational Linguistics |
Pages | 34-39 |
Number of pages | 6 |
DOIs | |
Publication status | Published - 1 Oct 2019 |