The problem we address is how to measure the quality of an explanation, from both a legal/ethical and technical perspective.

Towards explainability by design

Within the LIMPID project, we want to provide a general learning framework for explainability by design, applicable to different use cases. Relying on legal and ethical requirements stating that decisions affecting human rights be transparent and understandable, we devise novel learning approaches to build a classifier with an explanatory model by leveraging new losses and constraints. Our objectives are to:

  • define a set of desirable properties, given the regulatory requirements and inspired by the use cases;
  • design losses, constraints and evaluation metrics for an interpretable system;
  • elaborate a general framework for joint learning of a predictive model and an explanatory model;
  • instantiate this framework to deep convolutional neural networks and test the algorithms;
  • confront solutions with legal/ethical requirements…

Interpretability vs Explainability

Interpretability and explainability sometimes have different meanings depending on the research paper. Indeed, as stated in (Gilpin et al., 2018), explainable AI models “summarize the reasons […] for [their] behavior […] or produce insights about the causes of their decisions,” whereas “[T]he goal of interpretability is to describe the internals of a system in a way which is understandable to humans. […] Explainable models are interpretable by default, but the reverse is not always true.

However, as far as our use cases are concerned, we consider the two words as synonyms, and focus instead on the difference between global and local explainability.

Global vs local explainability

  • Global explainability is about the explanation of the learning algorithm as a whole, including the training data used, appropriate uses of the algorithms, and warnings regarding weaknesses of the algorithm and inappropriate uses. In a paper written in the framework of Télécom Paris’ Operational AI Ethics program, we also refer to global interpretability as a “user’s manual” approach (Beaudouin et al., 2020).
  • Local explainability refers to the ability of the system to tell a user why a particular decision was made (why an algorithm gives a particular output following a particular input).

While global and local explainability are two important features of trustworthiness, we mainly focus on local explainability as the first indispensable ingredient to transparency.

Explainability scenarios

We develop explainability scenarios, addressing the needs of the relevant audience for explanation:

  • citizens affected by the algorithmic decision;
  • system operators, who assume the operational responsibility for acting on the decisions;
  • oversight bodies in charge of guaranteeing the fairness, reliability and robustness of the system as a whole and responsible for rendering accounts to citizens and courts.

Local explanations occur ex post, such as when a decision is challenged, or when the system is audited. However, where decisions are validated by humans, the operator of the system may also need near to real-time local explanations to support the validation decision.

The explanations provided to oversight bodies need to demonstrate compliance with the original certification criteria of the system, as well as with legal and human rights principles, including the ability for individuals to challenge decisions.

Generalization to all predictive models

Even though the focus of this project is on deep convolutional neural networks, the principles that should guide local explainability are common to all predictive models.

In particular, an explanation depends strongly to whom they are provided: the citizen would like to understand and check why he/she is subject to a decision, the system operator wants to verify the merits of the decision and the oversight bodies with the help of technical staff may want to be able to dive into the system decision in order to ensure fairness of the final decision if it is discussed. Beyond the basic atoms of explanations, typically some high level features of the input image, we also need to specify the nature of the explanation, which may be symbolic and logical, causal and counter-factual, geometrical and visual.

Our works

As predictive models are generally based on correlation extraction, two first kinds of explanation from a predictive tool seem not trivial as shown by the current state of the literature. We consider here an explanation as a simple link between image features and final decision (Escalante et al. 2018). In image recognition, saliency maps and perturbations-based methods (Selvarajet al. 2017; Samek et al. 2019; Mundhenk et al. 2020) are amenable to informative visualizations.

Additionally, we work on a list of properties that should satisfy a “good” explanation (Lipton, 2018). While a consensus has not been currently reached in the literature (Lundberg and Lee, 2017; Alvarez-Melis et al. 2018). A good explanation should be faithful/correct, consistent, and invariant by some predefined transformations, adversarial robust, decomposable, and complete. This set of properties be converted into evaluation metrics to assess the quality of explanations provided by our system and in losses during the training phase.

Design of a general framework for learning a classifier with its explanatory model

The approaches developed so far to explainability decompose into two main families:

  • methods that provide post-hoc explanations (Ribeiro et al. 2016) and,
  • methods that build explainable models by design (Selvarajet al. 2017; Samek et al. 2019; Alvarez-Melis et al. 2018).

The so-called post-hoc interpretable methods present the drawback of not being used as a means to improve the quality of a decision system, but solely as a window to approximate what the system is doing. So far most of the machine learning tools are uniquely performance-driven, the learning algorithms do not encourage in any way interpretability.

Another line of research, attracting a growing interest, explores how to re-define the objectives of learning algorithms to design decision systems that explain their decision (Zhang et al. 2018). This approach opens the door to re-visit the validity domain of a decision system by preventing using the system if the explanation is not satisfactory or by warning the user of a potentially misleading decision. However, this explainability by design approach is not without its pitfalls. First works in this direction seem to show that explainability often comes at the price of lower accuracy, a crucial drawback that has to be studied carefully.

Our works

We address the aforementioned issues by developing a novel explainability by design approach based on two pillars:

  • an axiomatic approach of explainability based on the properties of an explanation…
  • …combined with a view of an explanation as a synthetic and high-level representation of the decision function implemented by the system.

Our goal is to re-visit the definition of supervised classification by considering an explainable classification system as a pair of a predictive model and an explanatory model, the two of them being strongly linked. The predictive model as usual provides a decision while the explanatory model outputs an explanation seen as a representation of the model’s decision process. However, the explanatory model provide feedback to the predictive model, necessitating modifying the loss function and introducing dedicated constraints corresponding to the desirable properties of explanation, as well as the learning algorithm.

As for the whole LIMPID project, we put emphasis on the versatility of the developed approaches that should work for deep convolutional neural networks as well as other differentiable classifiers.

Expected benefits of the explainability by design approach

Assessment of explainability by design approaches with respect to legal/ethical needs

Given different scenario and metrics, we assess the quality of explanations provided by our explainability by design approach and by relevant baselines, including legal requirements, and iteratively improve the approach to seek a model that achieves this explainability by design.

The two generic use cases studied in this project give rise to strong legal and ethical requirements first because of potential consequences suffered by individuals, and second because both can be administered by the State, which has particularly high reliability and transparency requirements toward citizens.

The conditions under which the two systems are deployed must guarantee the quality of the system, the absence of discrimination, and preserve citizens’ rights to challenge administrative decisions. In addition, these use cases create interferences with individuals’ right to protection of personal data, and this interference need to be analyzed under the proportionality test of the European Charter of Fundamental Rights and the European Convention on Human Rights.

Because of these requirements, the government decree authorizing use of these technologies be closely scrutinized ex ante higher legal authorities (for instance in France, the Conseil d’État), and likely challenged ex post by individuals or civil liberties groups. The lessons learned from LIMPID can be extended to many other critical applications involving image recognition and government decisions. Reliability, fairness and local interpretability by design also contribute an important brick toward the certification of machine learning systems for critical applications.

Reliability, fairness and local interpretability by design contribute an important brick toward the certification of machine learning systems for critical applications.

Certification challenges

Traditional approaches to software verification and validation (V&V) are poorly adapted to neural networks (Peterson 1993; Borg et al. 2019; U.S. Food and Drug Administration 2019). The challenges include the non-determinism of neural network decisions, which makes it hard to demonstrate the absence of unintended functionality, and the adaptive nature of machine-learning algorithms, which makes them a moving target (Borg et al. 2019; U.S. Food and Drug Administration 2019). Specifying a set of requirements that comprehensively describe the behavior of a neural network is considered the most difficult challenge with regard to traditional V&V and certification approaches (Borg et al. 2019; Bhattacharyya et al. 2015).

The absence of complete requirements poses a problem because one of the objectives of V&V is to compare the behavior of the software to a document that describes precisely and comprehensively the system’s intended behavior (Peterson 1993). For neural networks, there may remain a degree of uncertainty about the output for a given input. Other barriers include the absence of detailed design documentation and the lack of interpretability of machine learning models, which challenge comprehensibility and trust, which are generally required in certification processes (Borg et al. 2019).

Our works

While not overcoming all of these challenges, the solutions developed in LIMPID permit developers and operators to measure and demonstrate compliance with key quality, legal and ethical parameters. The ability to measure and demonstrate compliance is currently an important missing element in the approval of machine learning systems for critical applications. The measurement of key quality parameters serve not only during the approval process, but also throughout the system’s life cycle, as it is regularly reviewed and tested for errors including bias.


Alvarez-Melis et al., 2018 D. Alvarez Melis and T. Jaakkola, “Towards Robust Interpretability with Self-Explaining Neural Networks,” in Advances in Neural Information Processing Systems, 2018, vol. 31.

Beaudouin et al., 2020 V. Beaudouin, I. Bloch, D. Bounie, S. Clémençon, F. d’Alché-Buc, J. Eagan, et al., “Flexible and Context-Specific AI Explainability: A Multidisciplinary Approach.” 2020.

Bhattacharyya et al. 2015 S. Bhattacharyya, D. Cofer, D. Musliner, J. Mueller, and E. Engstrom, “Certification considerations for adaptive systems,” in 2015 International Conference on Unmanned Aircraft Systems (ICUAS), 2015, pp. 270–279.

Borg et al. 2019 M. Borg, C. Englund, K. Wnuk, B. Duran, C. Levandowski, S. Gao, et al., “Safely Entering the Deep: A Review of Verification and Validation for Machine Learning and a Challenge Elicitation in the Automotive Industry,” Journal of Automotive Software Engineering, vol. 1, no. 1, pp. 1–19, 2019,

Escalante et al., 2018 H. J. Escalante, S Escalera, I Guyon, X Baró, Y Güçlütürk et al. Explainable and interpretable models in computer vision and machine learning, Springer, 2018.

Gilpin et al., 2018 Gilpin, L. H., Bau, D., Yuan, B. Z., Bajwa, A., Specter, M., and Kagal, L. (2018). “Explaining Explanations: An Overview of Interpretability of Machine Learning.” In 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), pages 80–89. IEEE.

Lipton, 2018 Z. C. Lipton, “The Mythos of Model Interpretability: In Machine Learning, the Concept of Interpretability is Both Important and Slippery.,” Queue, vol. 16, no. 3, pp. 31–57, Jun. 2018,

Lundberg and Lee, 2017 S. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions.”

Peterson 1993, Gerald E. Peterson “Foundation for neural network verification and validation”, Proc. SPIE 1966, Science of Artificial Neural Networks II, (19 August 1993);

Ribeiro et al. 2016 M. T. Ribeiro, S. Singh, and C. Guestrin, “‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2016, pp. 1135–1144.

Samek et al., 2019 W. Samek, G.Montavon, A.Vedaldi, L. Kai Hansen, K.R. Müller:Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS -11700, Springer, 2019.

Selvaraj et al., 2017 , Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., and Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, pages 618–626.

U.S. Food and Drug Administration 2019, Swanson E. Plastic Surgeons Defend Textured Breast Implants at 2019 U.S. Food and Drug Administration Hearing: Why It Is Time to Reconsider. Plast Reconstr Surg Glob Open. 2019 Aug 30;7(8):e2410. doi: 10.1097/GOX.0000000000002410. PMID: 31592028; PMCID: PMC6756678.

Zhang et al. 2018 Q. Zhang, Y. N. Wu, and S.-C. Zhu, “Interpretable Convolutional Neural Networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 8827–8836.