<div class="csl-bib-body">
<div class="csl-entry">Alexander, D., Kusa, W., & de Vries, A. (2022). ORCAS-I: Queries Annotated with Intent Using Weak Supervision. In <i>SIGIR ’22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</i> (pp. 3057–3066). Association for Computing Machinery. https://doi.org/10.1145/3477495.3531737</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/150350
-
dc.description.abstract
User intent classification is an important task in information retrieval. In this work, we introduce a revised taxonomy of user intent. We take the widely used differentiation between navigational, transactional and informational queries as a starting point, and identify three different sub-classes for the informational queries: instrumental, factual and abstain. The resulting classification of user queries is more fine-grained, reaches a high level of consistency between annotators, and can serve as the basis for an effective automatic classification process. The newly introduced categories help distinguish between types of queries that a retrieval system could act upon, for example by prioritizing different types of results in the ranking. We have used a weak supervision approach based on Snorkel to annotate the ORCAS dataset according to our new user intent taxonomy, utilising established heuristics and keywords to construct rules for the prediction of the intent category. We then present a series of experiments with a variety of machine learning models, using the labels from the weak supervision stage as training data, but find that the results produced by Snorkel are not outperformed by these competing approaches and can be considered state-of-the-art. The advantage of a rule-based approach like Snorkel's is its efficient deployment in an actual system, where intent classification would be executed for every query issued. The resource released with this paper is the ORCAS-I dataset: a labelled version of the ORCAS click-based dataset of Web queries, which provides 18 million connections to 10 million distinct queries. We anticipate the usage of this resource in a scenario where the retrieval system would change its internal workings and search user interface to match the type of information request. For example, a navigational query could trigger just a short result list; and, for instrumental intent the system could rank tutorials and instructions higher than for other types of queries.
en
dc.description.sponsorship
European Commission
-
dc.language.iso
en
-
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
-
dc.subject
click data
en
dc.subject
weak supervision
en
dc.subject
snorkel
en
dc.subject
web search
en
dc.subject
intent labelling
en
dc.title
ORCAS-I: Queries Annotated with Intent Using Weak Supervision
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.rights.license
Creative Commons Namensnennung 4.0 International
de
dc.rights.license
Creative Commons Attribution 4.0 International
en
dc.contributor.affiliation
Radboud University Nijmegen, Netherlands (the)
-
dc.contributor.affiliation
Radboud University Nijmegen, Netherlands (the)
-
dc.relation.isbn
978-1-4503-8732-3
-
dc.description.startpage
3057
-
dc.description.endpage
3066
-
dc.relation.grantno
860721
-
dc.rights.holder
2022 Copyright held by the owner/author(s)
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
-
tuw.peerreviewed
true
-
tuw.relation.publisher
Association for Computing Machinery
-
tuw.relation.publisherplace
New York
-
tuw.project.title
Domänen-spezifische Systeme für Informationsextraktion und -suche
-
tuw.researchTopic.id
I4a
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E194-04 - Forschungsbereich Data Science
-
tuw.publisher.doi
10.1145/3477495.3531737
-
dc.identifier.libraryid
AC17202457
-
dc.description.numberOfPages
10
-
tuw.author.orcid
0000-0003-4420-4147
-
dc.rights.identifier
CC BY 4.0
de
dc.rights.identifier
CC BY 4.0
en
tuw.event.name
The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval
-
tuw.event.startdate
11-07-2022
-
tuw.event.enddate
15-07-2022
-
tuw.event.online
Hybrid
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Madrid
-
tuw.event.country
ES
-
tuw.event.presenter
Alexander, Daria
-
tuw.event.presenter
Kusa, Wojciech
-
tuw.event.track
Multi Track
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Wirtschaftswissenschaften
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
5020
-
wb.sciencebranch.value
90
-
wb.sciencebranch.value
10
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.openaccessfulltext
Open Access
-
item.openairetype
conference paper
-
item.fulltext
with Fulltext
-
item.mimetype
application/pdf
-
item.languageiso639-1
en
-
item.grantfulltext
open
-
item.cerifentitytype
Publications
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.dept
Radboud University Nijmegen
-
crisitem.author.orcid
0000-0003-4420-4147
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering