Abstract
As machine learning continues to evolve, its integration into mental health practices promises to provide more effective, efficient, and accessible care. However, addressing ethical considerations remains important to harnessing the full potential of machine learning in mental health. In this thesis, we focus on two challenges of responsible machine learning; fairness and explainability within the mental health domain. Our work involves developing and validating concept-based models to improve explainability in mental health applications, as well as identifying, evaluating, and mitigating bias in AI models within this field.
This thesis is organized around five sub-research questions designed to address the main research question: `How can we develop algorithmic fairness and explainability methods to address the specific needs of the mental health domain?' We begin by exploring a range of concept-bottleneck models, demonstrating their potential in affective computing (Chapter 2) and mental health applications (Chapter 3). These findings motivate future research to integrate higher-level, clinically validated concepts into complex models, enabling clinicians to justify and correct model predictions. Despite promising results, we identify gender bias in these models.
To address this, we further investigate existing gender biases in training datasets, pre-trained word embeddings, and machine learning models in the mental health domain (Chapter 4). We demonstrate how even accurate prevalence rate differences in training data can negatively impact downstream models and recommend a counterfactual data augmentation approach as a bias mitigation strategy. Next, we seek to bridge the gap between clinicians and mathematical fairness objectives by conducting interviews with clinicians, showing how clinical insights can inform the selection of fairness measures for real-world applications (Chapter 5). Based on their input, we introduce a gain-based model selection method that prioritizes the performance of individual groups over parity-based fairness measures. Finally, we present new bias mitigation techniques, ProxyMute and ProxyROAR, which automatically remove sensitive information (e.g., gender-related features) from the feature space using explainability methods (Chapter 6). We show that these approaches provide a viable alternative when dictionary-based methods are not suitable.
Collectively, these studies make a significant contribution by highlighting both the potential and the challenges of applying fairness and explainability methods in the mental health domain, while also proposing novel approaches to address the limitations of existing methods to ensure the responsible use of AI models in this field. All the codes of experiments are made publicly available to speed up the research in this field.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 15 May 2025 |
Place of Publication | Utrecht |
Publisher | |
Print ISBNs | 978-90-393-7848-9 |
DOIs | |
Publication status | Published - 15 May 2025 |
Keywords
- fair machine learning
- explainable AI
- mental health AI
- Natural language processing
- responsible machine learning
- fairness in clinical NLP