Torfah, H., Shah, S., Chakraborty, S., Akshay, S., & Seshia, S. A. (2021). Synthesizing Pareto-Optimal Interpretations for Black-Box Models. In Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021 (pp. 153–162). TU Wien Academic Press. https://doi.org/10.34727/2021/isbn.978-3-85448-046-4_24
E192-04 - Forschungsbereich Formal Methods in Systems Engineering
-
Series:
Conference Series: Formal Methods in Computer-Aided Design
-
Published in:
Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021
-
Date (published):
Sep-2021
-
Number of Pages:
10
-
Publisher:
TU Wien Academic Press, Wien
-
Peer reviewed:
Yes
-
Keywords:
formal methods
en
formale Methode
de
Abstract:
We present a new multi-objective optimization approach
for synthesizing interpretations that “explain” the behavior
of black-box machine learning models. Constructing
human-understandable interpretations for black-box models often
requires balancing conflicting objectives. A simple interpretation
may be easier to understand for humans while being less precise
in its predictions vis-a-vis a complex interpretation. Existing
methods for synthesizing interpretations use a single objective
function and are often optimized for a single class of interpretations.
In contrast, we provide a more general and multi-objective
synthesis framework that allows users to choose (1) the class of
syntactic templates from which an interpretation should be synthesized,
and (2) quantitative measures on both the correctness
and explainability of an interpretation. For a given black-box,
our approach yields a set of Pareto-optimal interpretations with
respect to the correctness and explainability measures. We show
that the underlying multi-objective optimization problem can be
solved via a reduction to quantitative constraint solving, such as
weighted maximum satisfiability. To demonstrate the benefits of
our approach, we have applied it to synthesize interpretations
for black-box neural-network classifiers. Our experiments show
that there often exists a rich and varied set of choices for
interpretations that are missed by existing approaches.