Evaluating the quality of machine leaning interpretability techniques


Explaining machine learning black-box decisions receive huge attention especially after the EU General Data Protection Regulation, known as GDPR. The current interpretability techniques are roughly partitioned into two groups: saliency and perturbation approaches. In saliency-based interpretability approaches, the signal from gradient or output composition is used to infer salient features [1,2].  On the other side, the perturbation based approaches depend on querying the model around the instance to be explained to infer the relevance of input features towards the output [3].

Such saliency and perturbations methods offer many desirable properties such as their simple formulations. However, assessing the quality of these interpretability techniques is still an open challenge [4]. This is mainly because there is no adequate well defined concise metrics for assessing the quality of the explanations provided by different interpretability techniques. The goal of this work is to develop metrics for assessing different interpretability frameworks on different types of data.



[1] Selvaraju, Ramprasaath R., Das, Abhishek, Vedantam, Ramakrishna, Cogswell, Michael, Parikh, Devi, and Batra, Dhruv. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. In ICCV, 2017. URL http://arxiv.org/ abs/1610.02391.

[2] Simonyan, Karen, Vedaldi, Andrea, and Zisserman, Andrew. Deep inside convolutional networks: Visualising image classification models and saliency maps. In International Conference on Learning Representations (Workshop Track), 2014.

[3] M. T. Ribeiro, S. Singh, and C. Guestrin. Anchors: High precision model-agnostic explanations. In AAAI Conference on Artificial Intelligence, 2018

[4] Doshi-Velez et al. Accountability of AI under the law: The role of explanation. arXiv preprint arXiv:1711.01134, 2017