Abstract
Word embeddings are increasingly used for the automatic detection of semantic change; yet, a robust evaluation and systematic comparison of the choices involved has been lacking. We propose a new evaluation framework for semantic change detection and find that (i) using the whole time series is preferable over only comparing between the first and last time points; (ii) independently trained and aligned embeddings perform better than continuously trained embeddings for long time periods; and (iii) that the reference point for comparison matters. We also present an analysis of the changes detected on a large Twitter dataset spanning 5.5 years.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) |
| Place of Publication | Hong Kong, China |
| Publisher | Association for Computational Linguistics |
| Pages | 66-76 |
| Number of pages | 11 |
| DOIs | |
| Publication status | Published - 3 Nov 2019 |
| Externally published | Yes |