TY - GEN
T1 - Explaining Model Behavior with Global Causal Analysis
AU - Robeer, Marcel
AU - Bex, Floris
AU - Feelders, Ad
AU - Prakken, Henry
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023/10/30
Y1 - 2023/10/30
N2 - We present GLOBAL CAUSAL ANALYSIS (GCA) for text classification. GCA is a technique for global model-agnostic explainability drawing from well-established observational causal structure learning algorithms. GCA generates an explanatory graph from high-level human-interpretable features, revealing how these features affect each other and the black-box output. We show how these high-level features do not always have to be human-annotated, but can also be computationally inferred. Moreover, we discuss how the explanatory graph can be used for global model analysis in natural language processing (NLP): the graph shows the effect of different types of features on model behavior, whether these effects are causal effects or mere (spurious) correlations, and if and how different features interact. We then propose a three-step method for (semi-)automatically evaluating the quality, fidelity and stability of the GCA explanatory graph without requiring a ground truth. Finally, we provide a detailed GCA of a state-of-the-art NLP model, showing how setting a global one-versus-rest contrast can improve explanatory relevance, and demonstrating the utility of our three-step evaluation method.
AB - We present GLOBAL CAUSAL ANALYSIS (GCA) for text classification. GCA is a technique for global model-agnostic explainability drawing from well-established observational causal structure learning algorithms. GCA generates an explanatory graph from high-level human-interpretable features, revealing how these features affect each other and the black-box output. We show how these high-level features do not always have to be human-annotated, but can also be computationally inferred. Moreover, we discuss how the explanatory graph can be used for global model analysis in natural language processing (NLP): the graph shows the effect of different types of features on model behavior, whether these effects are causal effects or mere (spurious) correlations, and if and how different features interact. We then propose a three-step method for (semi-)automatically evaluating the quality, fidelity and stability of the GCA explanatory graph without requiring a ground truth. Finally, we provide a detailed GCA of a state-of-the-art NLP model, showing how setting a global one-versus-rest contrast can improve explanatory relevance, and demonstrating the utility of our three-step evaluation method.
KW - Causal explanation
KW - Explainable Machine Learning (XML)
KW - Model-agnostic explanation
KW - Natural Language Processing (NLP)
UR - http://www.scopus.com/inward/record.url?scp=85176947124&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-44064-9_17
DO - 10.1007/978-3-031-44064-9_17
M3 - Conference contribution
SN - 978-3-031-44063-2
T3 - Communications in Computer and Information Science
SP - 299
EP - 323
BT - Explainable Artificial Intelligence
A2 - Longo, Luca
PB - Springer
CY - Cham
ER -