Abstract
Style is an integral part of natural language. However, evaluation methods for style measures are rare, often task-specific and usually do not control for content. We propose the modular, fine-grained and content-controlled similarity-based STyle EvaLuation framework (STEL) to test the performance of any model that can compare two sentences on style. We illustrate STEL with two general dimensions of style (formal/informal and simple/complex) as well as two specific characteristics of style (contrac'tion and numb3r substitution). We find that BERT-based methods outperform simple versions of commonly used style measures like 3-grams, punctuation frequency and LIWC-based approaches. We invite the addition of further tasks and task instances to STEL and hope to facilitate the improvement of style-sensitive measures.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing |
| Editors | Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih |
| Place of Publication | Dominican Republic |
| Publisher | Association for Computational Linguistics |
| Pages | 7109-7130 |
| Number of pages | 22 |
| DOIs | |
| Publication status | Published - Nov 2021 |