Leveraging Measurement Theory for Natural Language Processing Research

Research output: ThesisDoctoral thesis 1 (Research UU / Graduation UU)

Abstract

This dissertation explores the intersection of natural language processing (NLP) and measurement theory. NLP is a field aimed at enabling machines to process and generate human languages, such as English, German, and Mandarin. These languages are complex, diverse, and full of irregularities, making them challenging for machines to handle compared to structured artificial languages, like programming languages. NLP research ranges from simple tasks, like word frequency analysis, to complex ones involving the understanding and generation of human language. Measurement theory, traditionally applied in social sciences, addresses how we measure various properties scientifically. Key concepts include construct validity, which examines whether a measure accurately represents what it intends to measure, and reliability, which focuses on the consistency of a measure across different conditions. This dissertation argues that many challenges in NLP relate to measurement issues and suggests that principles from measurement theory can help address these challenges, particularly by providing tools to evaluate and improve the quality of NLP models. The structure of the dissertation is as follows: Chapter 2 offers background on NLP and measurement theory, covering essential text representation techniques in NLP, the history of measurement theory, and recent discussions on its unification across fields. Chapters 3-5 apply measurement theory to evaluate NLP models: Chapter 3 explores the reliability of gender bias measures in NLP by using classical reliability estimators from social sciences. Chapter 4 adapts a construct validity testing framework to assess the quality of text representations for social science constructs. Chapter 5 introduces a psychometric-based benchmarking approach to evaluate large language models, demonstrated through a case study on eighth-grade math proficiency. Chapters 6-7 focus on using measurement theory to improve NLP model performance: Chapter 6 presents a framework for designing user models based on measurement principles, achieving better-quality user representations than current methods. Chapter 7 examines how integrating human values into model training can enhance models’ ability to recognize values in human arguments. Chapters 8 and 9 reflect on current NLP research challenges and propose future directions: Chapter 8 identifies challenges in text-based personality computing, offering potential solutions and avenues for research. Chapter 9 concludes with a summary of the dissertation’s findings and suggests future work at the intersection of measurement theory and NLP. This work underscores the potential of measurement theory to enhance NLP research by offering frameworks for evaluating and designing more reliable and valid models. By integrating these approaches, the dissertation aims to bridge NLP and measurement theory, advancing NLP's capability to address complex measurement challenges.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Utrecht University
Supervisors/Advisors
  • Oberski, Daniel, Supervisor
  • Nguyen, Dong, Co-supervisor
Award date6 Dec 2024
Place of PublicationUtrecht
Publisher
Print ISBNs978-90-393-7757-4
Electronic ISBNs978-90-393-7757-4
DOIs
Publication statusPublished - 6 Dec 2024

Keywords

  • Natural Language Processing
  • Measurement Theory
  • Construct Validity
  • Reliability
  • Text Representation
  • Gender Bias
  • Social Science Constructs
  • Language Model Benchmarking
  • User Modeling
  • Personality Computing

Fingerprint

Dive into the research topics of 'Leveraging Measurement Theory for Natural Language Processing Research'. Together they form a unique fingerprint.

Cite this