Toward Robust model agnostic interpretability technique

Abstract

Machine learning predicting models have widely used in different domains, however, the lack of their interpretability can limit their adoption in many critical domains. Robustness and identity are mainly required features in any interpretability technique. Robustness states that similar inputs should have close explanations. Identity states that similar inputs should have similar outputs. Robustness is very important for different reasons. First, in order for an explanation to be valid around the point to be explained, it should remain roughly constant in its vicinity, regardless of how it is expressed.

Most of the current interpretability frameworks produce explanations which are not stable especially when the decision boundary of the model to be explained is complex [1]. The main goal of this project is to design a new robust and stable model agnostic interpretability technique.

[1] Alvarez-Melis, David, and Tommi S. Jaakkola. "On the Robustness of Interpretability Methods." arXiv preprint arXiv:1806.08049 (2018).