<div class="csl-bib-body">
<div class="csl-entry">Althammer, S., Zuccon, G., Hofstätter, S., Verberne, S., & Hanbury, A. (2023). Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection. In Q. Ai, L. Liu, & A. Moffat (Eds.), <i>SIGIR-AP ’23: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region</i> (pp. 139–149). Association for Computing Machinery. https://doi.org/10.1145/3624918.3625333</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/192523
-
dc.description.abstract
Search methods based on Pretrained Language Models (PLM) have demonstrated great effectiveness gains compared to statistical and early neural ranking models. However, fine-tuning PLM-based rankers requires a great amount of annotated training data. Annotating data involves a large manual effort and thus is expensive, especially in domain specific tasks. In this paper we investigate fine-tuning PLM-based rankers under limited training data and budget. We investigate two scenarios: fine-tuning a ranker from scratch, and domain adaptation starting with a ranker already fine-tuned on general data, and continuing fine-tuning on a target dataset. We observe a great variability in effectiveness when fine-tuning on different randomly selected subsets of training data. This suggests that it is possible to achieve effectiveness gains by actively selecting a subset of the training data that has the most positive effect on the rankers. This way, it would be possible to fine-tune effective PLM rankers at a reduced annotation budget. To investigate this, we adapt existing Active Learning (AL) strategies to the task of fine-tuning PLM rankers and investigate their effectiveness, also considering annotation and computational costs. Our extensive analysis shows that AL strategies do not significantly outperform random selection of training subsets in terms of effectiveness. We further find that gains provided by AL strategies come at the expense of more assessments (thus higher annotation costs) and AL strategies underperform random selection when comparing effectiveness given a fixed annotation cost. Our results highlight that "optimal"subsets of training data that provide high effectiveness at low annotation cost do exist, but current mainstream AL strategies applied to PLM rankers are not capable of identifying them.
en
dc.description.sponsorship
European Commission
-
dc.language.iso
en
-
dc.subject
active learning
en
dc.subject
domain adaptation
en
dc.subject
PLM-based rankers
en
dc.title
Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.affiliation
The University of Queensland, Australia
-
dc.contributor.affiliation
Cohere, Austria
-
dc.contributor.affiliation
Leiden University, Netherlands
-
dc.relation.isbn
9798400704086
-
dc.description.startpage
139
-
dc.description.endpage
149
-
dc.relation.grantno
860721
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
SIGIR-AP '23: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region
-
tuw.peerreviewed
true
-
tuw.relation.publisher
Association for Computing Machinery
-
tuw.relation.publisherplace
New York
-
tuw.project.title
Domänen-spezifische Systeme für Informationsextraktion und -suche
-
tuw.researchTopic.id
I4
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E194-04 - Forschungsbereich Data Science
-
tuw.publisher.doi
10.1145/3624918.3625333
-
dc.description.numberOfPages
11
-
tuw.author.orcid
0000-0001-9134-3815
-
tuw.author.orcid
0000-0003-0271-5563
-
tuw.author.orcid
0009-0006-1229-2612
-
tuw.author.orcid
0000-0002-9609-9505
-
tuw.author.orcid
0000-0002-7149-5843
-
tuw.event.name
SIGIR-AP '23: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region
en
tuw.event.startdate
26-11-2023
-
tuw.event.enddate
28-11-2023
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Beijing
-
tuw.event.country
CN
-
tuw.event.presenter
Althammer, Sophia
-
wb.sciencebranch
Informatik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.value
100
-
item.languageiso639-1
en
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.openairetype
conference paper
-
item.cerifentitytype
Publications
-
item.fulltext
no Fulltext
-
item.grantfulltext
none
-
crisitem.author.dept
E194-04 - Forschungsbereich E-Commerce
-
crisitem.author.dept
The University of Queensland, Australia
-
crisitem.author.dept
E194-04 - Forschungsbereich E-Commerce
-
crisitem.author.dept
Leiden University, Netherlands
-
crisitem.author.dept
E194-04 - Forschungsbereich E-Commerce
-
crisitem.author.orcid
0000-0003-0271-5563
-
crisitem.author.orcid
0000-0002-9609-9505
-
crisitem.author.orcid
0000-0002-7149-5843
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering