<div class="csl-bib-body">
<div class="csl-entry">Althammer, S., Hofstätter, S., Sertkan, M., Verberne, S., & Hanbury, A. (2022). PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval. In <i>Advances in Information Retrieval</i> (pp. 19–34). Springer. https://doi.org/10.1007/978-3-030-99736-6_2</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/137002
-
dc.description.abstract
Dense passage retrieval (DPR) models show great effectiveness gains in first stage retrieval for the web domain. However in the web domain we are in a setting with large amounts of training data and a query-to-passage or a query-to-document retrieval task. We investigate in this paper dense document-to-document retrieval with limited labelled target data for training, in particular legal case retrieval. In order to use DPR models for document-to-document retrieval, we propose a Paragraph Aggregation Retrieval Model (PARM) which liberates DPR models from their limited input length. PARM retrieves documents on the paragraph-level: for each query paragraph, relevant documents are retrieved based on their paragraphs. Then the relevant results per query paragraph are aggregated into one ranked list for the whole query document. For the aggregation we propose vector-based aggregation with reciprocal rank fusion (VRRF) weighting, which combines the advantages of rank-based aggregation and topical aggregation based on the dense embeddings. Experimental results show that VRRF outperforms rank-based aggregation strategies for dense document-to-document retrieval with PARM. We compare PARM to document-level retrieval and demonstrate higher retrieval effectiveness of PARM for lexical and dense first-stage retrieval on two different legal case retrieval collections. We investigate how to train the dense retrieval model for PARM on limited target data with labels on the paragraph or the document-level. In addition, we analyze the differences of the retrieved results of lexical and dense retrieval with PARM.
en
dc.language.iso
en
-
dc.subject
Dense Retrieval
en
dc.title
PARM: A Paragraph Aggregation Retrieval Model for Dense Document-to-Document Retrieval
-
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.relation.isbn
978-3-030-99735-9
-
dc.description.startpage
19
-
dc.description.endpage
34
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
Advances in Information Retrieval
-
tuw.peerreviewed
true
-
tuw.relation.publisher
Springer
-
tuw.researchTopic.id
I4a
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E194-04 - Forschungsbereich Data Science
-
tuw.publisher.doi
10.1007/978-3-030-99736-6_2
-
dc.description.numberOfPages
16
-
tuw.author.orcid
0000-0002-7149-5843
-
tuw.event.name
European Conference on Information Retrieval (ECIR 2022)
en
tuw.event.startdate
10-04-2022
-
tuw.event.enddate
14-04-2022
-
tuw.event.online
Hybrid
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Stavanger
-
tuw.event.country
NO
-
tuw.event.presenter
Althammer, Sophia
-
wb.sciencebranch
Informatik
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.value
100
-
item.openairetype
Inproceedings
-
item.openairetype
Konferenzbeitrag
-
item.grantfulltext
none
-
item.cerifentitytype
Publications
-
item.cerifentitytype
Publications
-
item.languageiso639-1
en
-
item.openairecristype
http://purl.org/coar/resource_type/c_18cf
-
item.openairecristype
http://purl.org/coar/resource_type/c_18cf
-
item.fulltext
no Fulltext
-
crisitem.author.dept
E194-04 - Forschungsbereich E-Commerce
-
crisitem.author.dept
E194-04 - Forschungsbereich E-Commerce
-
crisitem.author.dept
E194-04 - Forschungsbereich E-Commerce
-
crisitem.author.dept
E194-04 - Forschungsbereich E-Commerce
-
crisitem.author.orcid
0000-0003-0984-5221
-
crisitem.author.orcid
0000-0002-7149-5843
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering