Assessing the Reliability of Word Embedding Gender Bias Measures

Research output: Working paperPreprintAcademic

Abstract

Various measures have been proposed to quantify human-like social biases in word embeddings. However, bias scores based on these measures can suffer from measurement error. One indication of measurement quality is reliability, concerning the extent to which a measure produces consistent results. In this paper, we assess three types of reliability of word embedding gender bias measures, namely test-retest reliability, inter-rater consistency and internal consistency. Specifically, we investigate the consistency of bias scores across different choices of random seeds, scoring rules and words. Furthermore, we analyse the effects of various factors on these measures' reliability scores. Our findings inform better design of word embedding gender bias measures. Moreover, we urge researchers to be more critical about the application of such measures.
Original languageEnglish
PublisherarXiv
Pages1-23
DOIs
Publication statusPublished - 10 Sept 2021

Bibliographical note

23 pages, 24 figures, 3 tables. Accepted to EMNLP 2021

Keywords

  • cs.CL

Fingerprint

Dive into the research topics of 'Assessing the Reliability of Word Embedding Gender Bias Measures'. Together they form a unique fingerprint.

Cite this