reposiTUm: Neuro-Symbolic Visual Graph Question Answering with LLMs for Language Parsing

Datensatz Zitierlink:

http://hdl.handle.net/20.500.12708/193865
https://doi.org/10.34726/5462

Titel:

Neuro-Symbolic Visual Graph Question Answering with LLMs for Language Parsing

Zitat:

Bauer, J. J., Eiter, T., Higuera Ruiz, N. N., & Oetsch, J. (2023, November 21). Neuro-Symbolic Visual Graph Question Answering with LLMs for Language Parsing [Conference Presentation]. TAASP23: Workshop on Trends and Applications of Answer Set Programming, Potsdam, Germany. https://doi.org/10.34726/5462

reposiTUm-DOI:

10.34726/5462

Publikationstyp:

Vortrag - Konferenz Präsentation

Sprache:

Englisch

Autor_innen:

Bauer, Jakob Johannes
Eiter, Thomas
Higuera Ruiz, Nelson Nicolas
Oetsch, Johannes

Organisationseinheit:

E192-03 - Forschungsbereich Knowledge Based Systems

Datum (veröffentlicht):

21-Nov-2023

Veranstaltungsname:

TAASP23: Workshop on Trends and Applications of Answer Set Programming

Veranstaltungszeitraum:

20-Nov-2023 - 21-Nov-2023

Veranstaltungsort:

Potsdam, Deutschland

Keywords:

Neurosymbolic; Visual Question Answering

Abstract:

Images containing graph-based structures are an ubiquitous and popular form of data representation that, to the best of our knowledge, have not yet been considered in the domain of Visual Question Answering (VQA). We provide a respective novel dataset and present a modular neuro-symbolic approach as a first baseline. Our dataset extends CLEGR, an existing dataset for question answering on graphs inspired by metro networks. Notably, the graphs there are given in symbolic form, while we consider the more challenging problem of taking images of graphs as input. Our solution combines optical graph recognition for graph parsing, a pre-trained optical character recognition neural network for parsing node labels, and answer-set programming for reasoning. The model achieves an overall average accuracy of 73% on the dataset. While regular expressions are sufficient to parse the natural language questions, we also study various large-language models to obtain a more robust solution that also generalises well to variants of questions that are not part of the dataset. Our evaluation provides further evidence of the potential of modular neuro-symbolic systems, in particular with pre-trained models, to solve complex VQA tasks.

Forschungsschwerpunkte:

Logic and Computation: 100%

Wissenschaftszweig:

1020 - Informatik: 100%

Lizenz:

CC BY 4.0