Faithfulness Tests for Natural Language Explanations

Atanasova, Pepa; Camburu, Oana-Maria; Lioma, Christina; Lukasiewicz, Thomas; Simonsen, Jakob Grue; Augenstein, Isabelle

doi:10.18653/v1/2023.acl-short.25

DC Field

Value

Language

dc.contributor.author

Atanasova, Pepa

dc.contributor.author

Camburu, Oana-Maria

dc.contributor.author

Lioma, Christina

dc.contributor.author

Lukasiewicz, Thomas

dc.contributor.author

Simonsen, Jakob Grue

dc.contributor.author

Augenstein, Isabelle

dc.contributor.editor

Association for Computational Linguistics

dc.date.accessioned

2024-01-18T11:20:50Z

dc.date.available

2024-01-18T11:20:50Z

dc.date.issued

2023

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Atanasova, P., Camburu, O.-M., Lioma, C., Lukasiewicz, T., Simonsen, J. G., & Augenstein, I. (2023). Faithfulness Tests for Natural Language Explanations. In Association for Computational Linguistics (Ed.), <i>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)</i> (pp. 283–294). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.acl-short.25</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/192183

dc.description.abstract

Explanations of neural models aim to reveal a model’s decision-making process for its predictions. However, recent work shows that current methods giving explanations such as saliency maps or counterfactuals can be misleading, as they are prone to present reasons that are unfaithful to the model’s inner workings. This work explores the challenging question of evaluating the faithfulness of natural language explanations (NLEs). To this end, we present two tests. First, we propose a counterfactual input editor for inserting reasons that lead to counterfactual predictions but are not reflected by the NLEs. Second, we reconstruct inputs from the reasons stated in the generated NLEs and check how often they lead to the same predictions. Our tests can evaluate emerging NLE models, proving a fundamental tool in the development of faithful NLEs.

dc.language.iso

dc.subject

Natural language explanations

dc.subject

Faithfulness tests

dc.subject

Counterfactuals

dc.title

Faithfulness Tests for Natural Language Explanations

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.contributor.affiliation

University of Copenhagen, Denmark

dc.contributor.affiliation

University of Copenhagen, Denmark

dc.contributor.affiliation

University of Copenhagen, Denmark

dc.contributor.affiliation

University of Copenhagen, Denmark

dc.relation.isbn

978-1-959429-71-5

dc.relation.doi

10.18653/v1/2023.acl-short.47

dc.description.startpage

283

dc.description.endpage

294

dc.type.category

Full-Paper Contribution

tuw.booktitle

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

tuw.peerreviewed

true

tuw.relation.publisher

Association for Computational Linguistics

tuw.researchTopic.id

tuw.researchTopic.name

Information Systems Engineering

tuw.researchTopic.value

100

tuw.publication.orgunit

E192-07 - Forschungsbereich Artificial Intelligence Techniques

tuw.publisher.doi

10.18653/v1/2023.acl-short.25

dc.description.numberOfPages

tuw.author.orcid

0000-0002-0023-2616

tuw.author.orcid

0000-0003-2600-2701

tuw.author.orcid

0000-0002-3488-9392

tuw.author.orcid

0000-0003-1562-7909

tuw.event.name

61st Annual Meeting of the Association for Computational Linguistics (ACL 2023)

tuw.event.startdate

09-07-2023

tuw.event.enddate

14-07-2023

tuw.event.online

On Site

tuw.event.type

Event for scientific audience

tuw.event.place

Toronto

tuw.event.country

tuw.event.presenter

Atanasova, Pepa

wb.sciencebranch

Informatik

wb.sciencebranch

Mathematik

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

1010

wb.sciencebranch.value

item.languageiso639-1

item.openairetype

conference paper

item.grantfulltext

none

item.fulltext

no Fulltext

item.cerifentitytype

Publications

item.openairecristype

http://purl.org/coar/resource_type/c_5794

crisitem.author.dept

University of Copenhagen

crisitem.author.dept

University College London

crisitem.author.dept

University of Copenhagen

crisitem.author.dept

E192-07 - Forschungsbereich Artificial Intelligence Techniques

crisitem.author.dept

University of Copenhagen

crisitem.author.dept

University of Copenhagen

crisitem.author.orcid

0000-0002-0023-2616

crisitem.author.orcid

0000-0003-2600-2701

crisitem.author.orcid

0000-0002-3488-9392

crisitem.author.orcid

0000-0003-1562-7909

crisitem.author.parentorg

E192 - Institut für Logic and Computation

Appears in Collections:

Conference Paper

Show simple item record

Page view(s)

157

checked on Jan 18, 2024

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM