Graph working representations in hybrid models as explanations for common sense question answering

Breiner, Gabriel

doi:10.34726/hss.2023.102461

Record link:

https://doi.org/10.34726/hss.2023.102461
http://hdl.handle.net/20.500.12708/176566

Title:

Graph working representations in hybrid models as explanations for common sense question answering

Citation:

Breiner, G. (2023). Graph working representations in hybrid models as explanations for common sense question answering [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.102461

reposiTUm DOI:

10.34726/hss.2023.102461

CatalogPlus:

AC16823040

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Breiner, Gabriel

Advisor:

Hanbury, Allan

Co-advisor:

Recski, Gábor

Organisational Unit:

E194 - Institut für Information Systems Engineering

Date (published):

2023

Number of Pages:

Keywords:

Natural Language Processing; Machine Learning; Explainability; Common Sense Reasoning; Question Answering; Hybrid Model; Graph Attention Networks

Abstract:

Vernunftbegabte Systeme zu erzeugen ist eine der größten Hürden der derzeitigen Künstlichen Intelligenz Forschung und ist das Ziel einer immer größer werdenden Menge an Machine Learning Benchmarks, spezifisch in NLP.Neurale Modelle, die vortrainierte Sprach-Modelle (zumeist BERT) verwenden, dominieren die Ranglisten viele dieser Benchmarks, unter denen viele den Gebrauch zusätzlicher Wissensquellen verlangen - diese sind oft semantische Graphen allen voran ConceptNet.Die Verbindung dieser zwei Representationsarten durch das Anreichern des Graphs mit Hilfe von Graph Attention Networks bietet nicht nur bessere Performance, sondern erlaubt auch die Verwendung der angereicherten Graphen als Erklärungen des Modells.In dieser Arbeit evaluieren wir QA-GNN (ein Hybrid Modell, das BERT und ConceptNet verwendet) gegen die CommonSenseQA Benchmark und evaluieren zusätzlich seine Erklärbarkeit durch quantiative und qualitative Methoden.Wir experimentieren mit einer Alternative zu ConceptNet, 4Lang, ersetzen diese Graphen in der QA-GNN Architektur und vergleichen die Systeme in Bezug auf Performance (Accuracy) und Erklärbarkeit (Comprehensiveness und Sufficiency).Wir fanden dass diese Substitution einen hohen Verlust von Accuracy und Sufficiency verursacht, aber Comprehensiveness verbessert.Mit einer reduzierten Variante von ConceptNet (die nur Relationen der Art verwendet, die in 4Lang enthalten sind) erzielt das System noch schlechtere als das, das 4Lang verwendet, aber erhält interessanterweise fast perfekte Sufficiency.Wir kamen zu dem Entschluss, dass ERASER als Evaluationsmethode sub-optimal für Graph Erklärungen ist, wie QA-GNN sie liefert, und identifizieren quantitative Erklärbarkeit-Evaluation für Graph Erklärungen als vielversprechendes Forschungsfeld.

Enabling Machines to use common sense is currently one of the major hurdles for Artifical Intelligence contributing to an increasingly large number of tasks and benchmarks in the field of Machine Learning and NLP specifically. Neural models relying on pre-trained language models (such as mostly BERT) dominate the leaderboards of most of these benchmarks, however the nature of these tasks necessitates the use of additional knowledge sources, oftentimes in the form of knowledge graphs or semantic graphs, most notably ConceptNet. Joining pre-trained language models with knowledge graphs by updating the graph with the aid of Graph Attention Networks not only offers increased performance, but opens the possibility of using the updated graphs as explanations of the model. In this thesis we evaluated the explainability of QA-GNN (a hybrid model utilizing BERT and ConceptNet) on the CommonSenseQA benchmark quantitavely using ERASER and qualitatively by inspecting graph visualizations. We experimented with alternative semantic graphs, substituting ConceptNet graphs for 4Lang graphs in the QA-GNN architecture and compared these models in terms of Performance (Accuracy) and Explainability (Comprehensiveness and Sufficiency). We found that this substitution causes substantial losses in accuracy and sufficiency, but gains in comprehensiveness. Using a reduced version of ConceptNet (with relation types akin to 4Lang) yields worse accuracy than the model using 4Lang, but interestingly an almost perfect score of sufficiency. We conclude that using ERASER as evaluation method is sub-optimal for graph explanations such as the ones generated by QA-GNN and identify research in quantitative explanation evaluation for graph explanations specifically as promising future area of research.\end{abstract}

License:

In Copyright

Appears in Collections:

Thesis