Multilingual and crosslingual fact-checked claim retrieval

Pezo, Iva

doi:10.34726/hss.2025.126700

Record link:

https://doi.org/10.34726/hss.2025.126700
http://hdl.handle.net/20.500.12708/216220

Title:

Multilingual and crosslingual fact-checked claim retrieval

Citation:

Pezo, I. (2025). Multilingual and crosslingual fact-checked claim retrieval [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.126700

reposiTUm DOI:

10.34726/hss.2025.126700

CatalogPlus:

AC17562565

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Pezo, Iva

Advisor:

Hanbury, Allan

Co-advisor:

Staudinger, Moritz

Organisational Unit:

E194 - Institut für Information Systems Engineering

Date (published):

2025

Number of Pages:

Keywords:

Information Retrieval; Natural Language Processing; Fact Checking; Retrieval; Reranking; Large Language Models

Abstract:

With the growing influence of social media, ensuring the accuracy of online information has become increasingly important. Automated fact-checking involves multiple stages, including claim detection, prioritization, retrieval of evidence, veracity prediction, and explanation generation. A crucial yet often overlooked component is retrieving previously fact-checked claims, which helps combat misinformation by matching new claims with existing fact-checks.In this work, we develop a multilingual and crosslingual fact-checked claim retrieval system based on a hybrid retrieval pipeline that combines lexical and dense retrieval models. We systematically evaluate different retrieval and reranking strategies, demonstrating that hybrid ensembles effectively balance efficiency and effectiveness, outperforming individual retrievers. While reranking significantly enhances crosslingual retrieval, its impact in monolingual settings remains limited, highlighting the effectiveness of well-designed ensembling over increasing complex ranking layers.Additionally, we analyze the impact of preprocessing steps, compare models in terms of retrieval performance, execution time, number of parameters and memory usage, and conduct an error analysis to identify key limitations. Finally, we discuss potential improvements and future research directions to enhance multilingual fact-check retrieval.Our approach was applied to SemEval-2025 Task 7, where we present results and insights gained from our participation.

License:

In Copyright

Appears in Collections:

Thesis