Abstract
This thesis investigates word-level biases, employing computational linguistics methods to support decolonisation efforts within cultural heritage institutions. Museum catalogues often contain contested terminology shaped by colonial legacies. The identification and retrospective handling of such word-level biases and the negative biases potentially propagated by such terms, is a key activity in current decolonisation initiatives of museum institutions in the Western world. The research develops and demonstrates the utility of computational methods for detecting and analysing the biases of contested and potentially contested terms, with the goal of providing interpretable insights to heritage professionals.
Through a series of studies spanning historical newspapers, literary fiction, and social media, the thesis proposes methodologies and supporting pipelines, which identify key behaviours, attributes, received behaviours, and linguistic markers of known problematic terms as core vectors for social biasing for interpretation. Outcomes are shown to align well with known biases of well-recognised problematic terminology. In addition to surface level context features, the research explores proxy signals for prejudicial narratives, specifically offering empirical support for the phenomenon of aporophobia—disdain for poverty—by revealing the disproportionate association of low socio-economic contexts with negatively connoted topics. Additionally, the thesis introduces the ConConCor dataset—multi-sentence contexts annotated for offensiveness—offering a foundation for future studies into subjective judgments of harm in contested language. Overall, the research provides a methodological and conceptual framework for uncovering latent biases in cultural data, equipping institutions with tools to help facilitate decolonisation efforts.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 10 Jul 2025 |
Publisher | |
DOIs | |
Publication status | Published - 10 Jul 2025 |
Keywords
- NLP
- language models
- linguistic variations
- sociolinguistics
- corpus linguistics
- structural causal modelling
- decolonisation
- context analysis