Synthesizing Pareto-Optimal Interpretations for Black-Box Models

Torfah, Hazem; Shah, Shetal; Chakraborty, Supratik; Akshay, S.; Seshia, Sanjit A.

doi:10.34727/2021/isbn.978-3-85448-046-4_24

Datensatz Zitierlink:

http://hdl.handle.net/20.500.12708/18643
https://doi.org/10.34727/2021/isbn.978-3-85448-046-4_24

Titel:

Synthesizing Pareto-Optimal Interpretations for Black-Box Models

Zitat:

Torfah, H., Shah, S., Chakraborty, S., Akshay, S., & Seshia, S. A. (2021). Synthesizing Pareto-Optimal Interpretations for Black-Box Models. In Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021 (pp. 153–162). TU Wien Academic Press. https://doi.org/10.34727/2021/isbn.978-3-85448-046-4_24

reposiTUm-DOI:

10.34727/2021/isbn.978-3-85448-046-4_24

CatalogPlus:

AC17204489

Publikationstyp:

Konferenzbeitrag - Full-Paper Contribution

Sprache:

Englisch

Autor_innen:

Torfah, Hazem
Shah, Shetal
Chakraborty, Supratik
Akshay, S.
Seshia, Sanjit A.

Organisationseinheit:

E192-04 - Forschungsbereich Formal Methods in Systems Engineering

Reihe:

Conference Series: Formal Methods in Computer-Aided Design

Erschienen in:

Proceedings of the 21st Conference on Formal Methods in Computer-Aided Design – FMCAD 2021

Datum (veröffentlicht):

Sep-2021

Umfang:

Verlag:

TU Wien Academic Press, Wien

Peer Reviewed:

Keywords:

formal methods

formale Methode

Abstract:

We present a new multi-objective optimization approach for synthesizing interpretations that “explain” the behavior of black-box machine learning models. Constructing human-understandable interpretations for black-box models often requires balancing conflicting objectives. A simple interpretation may be easier to understand for humans while being less precise in its predictions vis-a-vis a complex interpretation. Existing methods for synthesizing interpretations use a single objective function and are often optimized for a single class of interpretations. In contrast, we provide a more general and multi-objective synthesis framework that allows users to choose (1) the class of syntactic templates from which an interpretation should be synthesized, and (2) quantitative measures on both the correctness and explainability of an interpretation. For a given black-box, our approach yields a set of Pareto-optimal interpretations with respect to the correctness and explainability measures. We show that the underlying multi-objective optimization problem can be solved via a reduction to quantitative constraint solving, such as weighted maximum satisfiability. To demonstrate the benefits of our approach, we have applied it to synthesize interpretations for black-box neural-network classifiers. Our experiments show that there often exists a rich and varied set of choices for interpretations that are missed by existing approaches.

Zugehörige Publikationen und Daten in reposiTUm: beinhaltet:

10.34727/2021/isbn.978-3-85448-046-4

Lizenz:

CC BY 4.0