<div class="csl-bib-body">
<div class="csl-entry">Lavrinovics, E., Biswas, R., Hose, K., & Bjerva, J. (2025). <i>MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluationof LLM Hallucinations</i>. arXiv. https://doi.org/10.48550/ARXIV.2505.14101</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/222486
-
dc.description
Large Language Models (LLMs) have inherent limitations of faithfulness and factuality, commonly referred to as hallucinations. Several benchmarks have been developed that provide a test bed for factuality evaluation within the context of English-centric datasets, while relying on supplementary informative context like web links or text passages but ignoring the available structured factual resources. To this end, Knowledge Graphs (KGs) have been identified as a useful aid for hallucination mitigation, as they provide a structured way to represent the facts about entities and their relations with minimal linguistic overhead. We bridge the lack of KG paths and multilinguality for factual language modeling within the existing hallucination evaluation benchmarks and propose a KG-based multilingual, multihop benchmark called MultiHal framed for generative text evaluation. As part of our data collection pipeline, we mined 140k KG-paths from open-domain KGs, from which we pruned noisy KG-paths, curating a high-quality subset of 25.9k. Our baseline evaluation shows an absolute scale improvement by approximately 0.12 to 0.36 points for the semantic similarity score, 0.16 to 0.36 for NLI entailment and 0.29 to 0.42 for hallucination detection in KG-RAG over vanilla QA across multiple languages and multiple models, demonstrating the potential of KG integration. We anticipate MultiHal will foster future research towards several graph-based hallucination mitigation and fact-checking tasks.
-
dc.language.iso
en
-
dc.subject
large language models
en
dc.subject
LLMs
en
dc.subject
hallucinations
en
dc.subject
benchmark
en
dc.title
MultiHal: Multilingual Dataset for Knowledge-Graph Grounded Evaluationof LLM Hallucinations
en
dc.type
Preprint
en
dc.type
Preprint
de
dc.identifier.arxiv
2505.14101
-
dc.contributor.affiliation
Aalborg University, Denmark
-
dc.contributor.affiliation
Aalborg University, Denmark
-
dc.contributor.affiliation
Aalborg University (Aalborg, DK)
-
tuw.researchTopic.id
I1
-
tuw.researchTopic.id
I4
-
tuw.researchTopic.name
Logic and Computation
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
50
-
tuw.researchTopic.value
50
-
tuw.publication.orgunit
E192-02 - Forschungsbereich Databases and Artificial Intelligence
-
tuw.publication.orgunit
E056-23 - Fachbereich Innovative Combinations and Applications of AI and ML (iCAIML)
-
tuw.publisher.doi
10.48550/ARXIV.2505.14101
-
tuw.author.orcid
0009-0000-1071-8970
-
tuw.author.orcid
0000-0002-7421-3389
-
tuw.author.orcid
0000-0001-7025-8099
-
tuw.author.orcid
0000-0002-9512-0739
-
tuw.publisher.server
arXiv
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Mathematik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
1010
-
wb.sciencebranch.value
80
-
wb.sciencebranch.value
20
-
item.openairecristype
http://purl.org/coar/resource_type/c_816b
-
item.fulltext
no Fulltext
-
item.cerifentitytype
Publications
-
item.grantfulltext
none
-
item.openairetype
preprint
-
item.languageiso639-1
en
-
crisitem.author.dept
Aalborg University, Denmark
-
crisitem.author.dept
Aalborg University, Denmark
-
crisitem.author.dept
E192-02 - Forschungsbereich Databases and Artificial Intelligence