Soft metrics for evaluation with disagreements: an assessment

Giulia Rizzi, Elisa Leonardelli, Massimo Poesio, Alexandra Uma, Maja Pavlovic, Silviu Paun, Paolo Rosso, Elisabetta Fersini

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

The move towards preserving judgement disagreements in NLP requires the identification of adequate evaluation metrics. We identify a set of key properties that such metrics should have, and assess the extent to which natural candidates for soft evaluation such as Cross Entropy satisfy such properties. We employ a theoretical framework, supported by a visual approach, by practical examples, and by the analysis of a real case scenario. Our results indicate that Cross Entropy can result in fairly paradoxical results in some cases, whereas other measures Manhattan distance and Euclidean distance exhibit a more intuitive behavior, at least for the case of binary classification.

Original languageEnglish
Title of host publication3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings
EditorsGavin Abercrombie, Valerio Basile, Davide Bernardi, Shiran Dudy, Simona Frenda, Lucy Havens, Sara Tonelli
PublisherEuropean Language Resources Association (ELRA)
Pages84-94
Number of pages11
ISBN (Electronic)9782493814234
ISBN (Print)9782493814234
Publication statusPublished - 21 May 2024
Event3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 - Torino, Italy
Duration: 21 May 2024 → …

Publication series

Name3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024 at LREC-COLING 2024 - Workshop Proceedings

Conference

Conference3rd Workshop on Perspectivist Approaches to NLP, NLPerspectives 2024
Country/TerritoryItaly
CityTorino
Period21/05/24 → …

Bibliographical note

Publisher Copyright:
© 2024 ELRA Language Resource Association.

Fingerprint

Dive into the research topics of 'Soft metrics for evaluation with disagreements: an assessment'. Together they form a unique fingerprint.

Cite this