Do Word Embeddings Capture Spelling Variation?

Dong Nguyen, Jack Grieve

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    Analyses of word embeddings have primarily focused on semantic and syntactic properties. However, word embeddings have the potential to encode other properties as well. In this paper, we propose a new perspective on the analysis of word embeddings by focusing on spelling variation. In social media, spelling variation is abundant and often socially meaningful. Here, we analyze word embeddings trained on Twitter and Reddit data. We present three analyses using pairs of word forms covering seven types of spelling variation in English. Taken together, our results show that word embeddings encode spelling variation patterns of various types to some extent, even embeddings trained using the skipgram model which does not take spelling into account. Our results also suggest a link between the intentionality of the variation and the distance of the non-conventional spellings to their conventional spellings.
    Original languageEnglish
    Title of host publicationProceedings of the 28th International Conference on Computational Linguistics
    EditorsDonia Scott, Nuria Bel, Chengqing Zong
    PublisherInternational Committee on Computational Linguistics
    Pages870-881
    Number of pages12
    DOIs
    Publication statusPublished - Dec 2020

    Fingerprint

    Dive into the research topics of 'Do Word Embeddings Capture Spelling Variation?'. Together they form a unique fingerprint.

    Cite this