Abstract
Counterfactuals are a valuable means for understanding decisions made by ML systems. However, the counterfactuals generated by the methods currently available for natural language text are either unrealistic or introduce imperceptible changes. We propose CounterfactualGAN: a method that combines a conditional GAN and the embeddings of a pretrained BERT encoder to model-agnostically generate realistic natural language text counterfactuals for explaining regression and classification tasks. Experimental results show that our method produces perceptibly distinguishable counterfactuals, while outperforming four baseline methods on fidelity and human judgments of naturalness, across multiple datasets and multiple predictive models.
Original language | English |
---|---|
Title of host publication | Findings of the Association for Computational Linguistics: EMNLP 2021 |
Editors | Marie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih |
Place of Publication | Punta Cana, Dominican Republic |
Publisher | Association for Computational Linguistics |
Pages | 3611–3625 |
DOIs | |
Publication status | Published - 2021 |
Event | 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021) - Barceló Bavaro Convention Centre, Punta Cana, Dominican Republic Duration: 7 Nov 2021 → 11 Nov 2021 https://2021.emnlp.org/ |
Conference
Conference | 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021) |
---|---|
Abbreviated title | EMNLP 2021 |
Country/Territory | Dominican Republic |
City | Punta Cana |
Period | 7/11/21 → 11/11/21 |
Internet address |
Bibliographical note
Findings of the Association for Computational Linguistics: EMNLP 2021Keywords
- explainability
- interpretability
- explainable artificial intelligence
- counterfactuals
- natural langue processing