Generating Realistic Natural Language Counterfactuals

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Counterfactuals are a valuable means for understanding decisions made by ML systems. However, the counterfactuals generated by the methods currently available for natural language text are either unrealistic or introduce imperceptible changes. We propose CounterfactualGAN: a method that combines a conditional GAN and the embeddings of a pretrained BERT encoder to model-agnostically generate realistic natural language text counterfactuals for explaining regression and classification tasks. Experimental results show that our method produces perceptibly distinguishable counterfactuals, while outperforming four baseline methods on fidelity and human judgments of naturalness, across multiple datasets and multiple predictive models.
Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics: EMNLP 2021
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
Place of PublicationPunta Cana, Dominican Republic
PublisherAssociation for Computational Linguistics
Pages3611–3625
DOIs
Publication statusPublished - 2021
Event2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021) - Barceló Bavaro Convention Centre, Punta Cana, Dominican Republic
Duration: 7 Nov 202111 Nov 2021
https://2021.emnlp.org/

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021)
Abbreviated titleEMNLP 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period7/11/2111/11/21
Internet address

Bibliographical note

Findings of the Association for Computational Linguistics: EMNLP 2021

Keywords

  • explainability
  • interpretability
  • explainable artificial intelligence
  • counterfactuals
  • natural langue processing

Fingerprint

Dive into the research topics of 'Generating Realistic Natural Language Counterfactuals'. Together they form a unique fingerprint.

Cite this