Aiming beyond the Obvious: Identifying Non-Obvious Cases in Semantic Similarity Datasets

Nicole Peinelt, Maria Liakata, Dong Nguyen

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    Existing datasets for scoring text pairs in terms of semantic similarity contain instances whose resolution differs according to the degree of difficulty. This paper proposes to distinguish obvious from non-obvious text pairs based on superficial lexical overlap and ground-truth labels. We characterise existing datasets in terms of containing difficult cases and find that recently proposed models struggle to capture the non-obvious cases of semantic similarity. We describe metrics that emphasise cases of similarity which require more complex inference and propose that these are used for evaluating systems for semantic similarity.
    Original languageEnglish
    Title of host publicationProceedings of the 57th Annual Meeting of the Association for Computational Linguistics
    Place of PublicationFlorence, Italy
    PublisherAssociation for Computational Linguistics
    Pages2792-2798
    Number of pages7
    DOIs
    Publication statusPublished - 28 Jul 2019

    Fingerprint

    Dive into the research topics of 'Aiming beyond the Obvious: Identifying Non-Obvious Cases in Semantic Similarity Datasets'. Together they form a unique fingerprint.

    Cite this