A Reproducibility and Generalizability Study of Large Language Models for Query Generation

Staudinger, Moritz; Kusa, Wojciech; Piroi, Florina; Lipani, Aldo; Hanbury, Allan

doi:10.1145/3673791.3698432

DC Field

Value

Language

dc.contributor.author

Staudinger, Moritz

dc.contributor.author

Kusa, Wojciech

dc.contributor.author

Piroi, Florina

dc.contributor.author

Lipani, Aldo

dc.contributor.author

Hanbury, Allan

dc.date.accessioned

2025-01-27T15:28:15Z

dc.date.available

2025-01-27T15:28:15Z

dc.date.issued

2024-12-08

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Staudinger, M., Kusa, W., Piroi, F., Lipani, A., & Hanbury, A. (2024). A Reproducibility and Generalizability Study of Large Language Models for Query Generation. In <i>SIGIR-AP 2024: Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region</i> (pp. 186–196). The Association for Computing Machinery. https://doi.org/10.1145/3673791.3698432</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/209780

dc.description.abstract

Systematic literature reviews (SLRs) are a cornerstone of academic research, yet they are often labour-intensive and time-consuming due to the detailed literature curation process. The advent of generative AI and large language models (LLMs) promises to revolutionize this process by assisting researchers in several tedious tasks, one of them being the generation of effective Boolean queries that will select the publications to consider including in a review. This paper presents an extensive study of Boolean query generation using LLMs for systematic reviews, reproducing and extending the work of Wang et al. and Alaniz et al. Our study investigates the replicability and reliability of results achieved using ChatGPT and compares its performance with open-source alternatives like Mistral and Zephyr to provide a more comprehensive analysis of LLMs for query generation. Therefore, we implemented a pipeline, which automatically creates a Boolean query for a given review topic by using a previously defined LLM, retrieves all documents for this query from the PubMed database and then evaluates the results. With this pipeline we first assess whether the results obtained using ChatGPT for query generation are reproducible and consistent. We then generalize our results by analyzing and evaluating open-source models and evaluating their efficacy in generating Boolean queries. Finally, we conduct a failure analysis to identify and discuss the limitations and shortcomings of using LLMs for Boolean query generation. This examination helps to understand the gaps and potential areas for improvement in the application of LLMs to information retrieval tasks. Our findings highlight the strengths, limitations, and potential of LLMs in the domain of information retrieval and literature review automation. Our code is available online.

dc.language.iso

dc.subject

systematic reviews

dc.subject

Boolean query

dc.subject

LLMs

dc.subject

query generation

dc.title

A Reproducibility and Generalizability Study of Large Language Models for Query Generation

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.relation.publication

SIGIR-AP 2024: Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

dc.contributor.affiliation

University College London, United Kingdom of Great Britain and Northern Ireland (the)

dc.relation.isbn

979-8-4007-0724-7

dc.description.startpage

186

dc.description.endpage

196

dc.type.category

Full-Paper Contribution

tuw.booktitle

SIGIR-AP 2024: Proceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

tuw.peerreviewed

true

tuw.relation.publisher

The Association for Computing Machinery

tuw.relation.publisherplace

New York, NY, USA

tuw.researchTopic.id

tuw.researchTopic.name

Information Systems Engineering

tuw.researchTopic.value

100

tuw.publication.orgunit

E194-04 - Forschungsbereich Data Science

tuw.publication.orgunit

E058-06 - Fachbereich Zentrum für Forschungsdatenmanagement

tuw.publisher.doi

10.1145/3673791.3698432

dc.description.numberOfPages

tuw.author.orcid

0000-0002-5164-2690

tuw.author.orcid

0000-0003-4420-4147

tuw.author.orcid

0000-0001-7584-6439

tuw.author.orcid

0000-0002-3643-6493

tuw.author.orcid

0000-0002-7149-5843

tuw.event.name

SIGIR-AP 2024

tuw.event.startdate

09-12-2024

tuw.event.enddate

12-12-2024

tuw.event.online

On Site

tuw.event.type

Event for scientific audience

tuw.event.place

Tokyo

tuw.event.country

tuw.event.presenter

Staudinger, Moritz

wb.sciencebranch

Informatik

wb.sciencebranch

Wirtschaftswissenschaften

wb.sciencebranch.oefos

1020

wb.sciencebranch.oefos

5020

wb.sciencebranch.value

item.openairetype

conference paper

item.cerifentitytype

Publications

item.grantfulltext

none

item.languageiso639-1

item.openairecristype

http://purl.org/coar/resource_type/c_5794

item.fulltext

no Fulltext

crisitem.author.dept

E194-04 - Forschungsbereich Data Science

crisitem.author.dept

E194-04 - Forschungsbereich Data Science

crisitem.author.dept

E058-06 - Fachbereich Zentrum für Forschungsdatenmanagement

crisitem.author.dept

E194-04 - Forschungsbereich Data Science

crisitem.author.dept

E194-04 - Forschungsbereich Data Science

crisitem.author.orcid

0000-0002-5164-2690

crisitem.author.orcid

0000-0003-4420-4147

crisitem.author.orcid

0000-0001-7584-6439

crisitem.author.orcid

0000-0002-7149-5843

crisitem.author.parentorg

E194 - Institut für Information Systems Engineering

crisitem.author.parentorg

E194 - Institut für Information Systems Engineering

crisitem.author.parentorg

E058 - Forschungs-, Technologie- und Innovationssupport

crisitem.author.parentorg

E194 - Institut für Information Systems Engineering

crisitem.author.parentorg

E194 - Institut für Information Systems Engineering

Appears in Collections:

Conference Paper

Show simple item record

Page view(s)

159

checked on Jan 27, 2025

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM