Order in the Court: Explainable AI Methods Prone to Disagreement

Michael Neely, Stefan F Schouten, Maurits JR Bleeker, Ana Lucic

Research output: Contribution to ConferencePaperAcademic

Abstract

In Natural Language Processing, feature-additive explanation methods quantify the independent contribution of each input token towards a model's decision. By computing the rank correlation between attention weights and the scores produced by a small sample of these methods, previous analyses have sought to either invalidate or support the role of attention-based explanations as a faithful and plausible measure of salience. To investigate what measures of rank correlation can reliably conclude, we comprehensively compare feature-additive methods, including attention-based explanations, across several neural architectures and tasks. In most cases, we find that none of our chosen methods agree. Therefore, we argue that rank correlation is largely uninformative and does not measure the quality of feature-additive methods. Additionally, the range of conclusions a practitioner may draw from a single explainability algorithm are limited.
Original languageEnglish
Publication statusPublished - 7 May 2021
Externally publishedYes

Fingerprint

Dive into the research topics of 'Order in the Court: Explainable AI Methods Prone to Disagreement'. Together they form a unique fingerprint.

Cite this