TY - JOUR
T1 - TextFocus
T2 - Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing
AU - Mariotti, Ettore
AU - Arias-Duart, Anna
AU - Cafagna, Michele
AU - Gatt, Albert
AU - Garcia-Gasulla, Dario
AU - Alonso-Moral, Jose Maria
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2024/5/31
Y1 - 2024/5/31
N2 - Among the existing eXplainable AI (XAI) approaches, Feature Attribution methods are a popular option due to their interpretable nature. However, each method leads to a different solution, thus introducing uncertainty regarding their reliability and coherence with respect to the underlying model. This work introduces TextFocus, a metric for evaluating the faithfulness of Feature Attribution methods for Natural Language Processing (NLP) tasks involving classification. To address the absence of ground truth explanations for such methods, we introduce the concept of textual mosaics. A mosaic is composed of a combination of sentences belonging to different classes, which provides an implicit ground truth for attribution. The accuracy of explanations can be then evaluated by comparing feature attribution scores with the known class labels in the mosaic. The performance of six feature attribution methods is systematically compared on three sentence classification tasks by using TextFocus, with Integrated Gradients being the best overall method in terms of faithfulness and computational requirements. The proposed methodology fills a gap in NLP evaluation, by providing an objective way to assess Feature Attribution methods while finding their optimal parameters.
AB - Among the existing eXplainable AI (XAI) approaches, Feature Attribution methods are a popular option due to their interpretable nature. However, each method leads to a different solution, thus introducing uncertainty regarding their reliability and coherence with respect to the underlying model. This work introduces TextFocus, a metric for evaluating the faithfulness of Feature Attribution methods for Natural Language Processing (NLP) tasks involving classification. To address the absence of ground truth explanations for such methods, we introduce the concept of textual mosaics. A mosaic is composed of a combination of sentences belonging to different classes, which provides an implicit ground truth for attribution. The accuracy of explanations can be then evaluated by comparing feature attribution scores with the known class labels in the mosaic. The performance of six feature attribution methods is systematically compared on three sentence classification tasks by using TextFocus, with Integrated Gradients being the best overall method in terms of faithfulness and computational requirements. The proposed methodology fills a gap in NLP evaluation, by providing an objective way to assess Feature Attribution methods while finding their optimal parameters.
KW - Artificial intelligence (AI)
KW - explainable AI (XAI)
KW - explanation faithfulness
KW - feature attribution
KW - feature importance
KW - natural language processing (NLP)
KW - trustworthy AI
UR - http://www.scopus.com/inward/record.url?scp=85194838907&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2024.3408062
DO - 10.1109/ACCESS.2024.3408062
M3 - Article
AN - SCOPUS:85194838907
SN - 2169-3536
VL - 12
SP - 138870
EP - 138880
JO - IEEE Access
JF - IEEE Access
ER -