Interpretability of Black box models

Abstract

In this project, we focus on different kinds on Interpretability; model specific and model agnostic techniques. We developed ILIME, a novel technique that explains the prediction of any supervised learning-based prediction model by relying on an interpretation mechanism that is based on the most influencing instances for the prediction of the instance to be explained. We demonstrate the effectiveness of our approach by explaining different models on different datasets. In addition, we present a global attribution technique that aggregates the local explanations generated from ILIME into few global explanations that can mimic the behaviour of the black-box model globally in a simple way.

Additionally, this project focuses on developing Automated Concept-based Decision Tree Explanations that provides human-understandable concept-based explanations for classification networks. Such explanation technique provides end-users with the flexibility of customising the model explanations by allowing them to choose the concepts of interest among a set of automatically extracted visual human-understandable concepts and infer such concepts from the hidden layer activations. Then, such concepts are interpreted through a shallow decision tree that includes concepts deem important to the model.

Project Publications:

R. ElShawi, Y Sherif, M. Al-Mallah, S. Sakr. ILIME: Local and Global Interpretable Model-Agnostic Explainer of Black-Box Decision. InEuropean Conference on Advances in Databases and Information Systems 2019 Sep 8 (pp. 53-68). Springer, Cham. link
R. El Shawi, Y. Sherif, S. Sakr. Towards Automated Concept-based Decision TreeExplanations for CNNs. In EDBT 2021 (pp. 379-384). link

Contact Information

Radwa El Shawi

Radwa [dot] elshawi [at] ut [dot] ee