<div class="csl-bib-body">
<div class="csl-entry">Mendoza, O., Kusa, W., El-Ebshihy, A. M., Wu, R., Pride, D., Knoth, P., Herrmannova, D., Piroi, F., Pasi, G., & Hanbury, A. (2022). Benchmark for Research Theme Classification of Scholarly Documents. In <i>Proceedings of the Workshop. Third Workshop on Scholarly Document Processing</i> (pp. 253–262). Association for Computational Linguistics. https://doi.org/10.34726/4521</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/187544
-
dc.identifier.uri
https://doi.org/10.34726/4521
-
dc.description.abstract
We present a new gold-standard dataset and a benchmark for the Research Theme Identification task, a sub-task of the Scholarly Knowledge Graph Generation shared task, at the 3rd Workshop on Scholarly Document Processing. The objective of the shared task was to label given research papers with research themes from a total of 36 themes. The benchmark was compiled using data drawn from the largest overall assessment of university research output ever undertaken globally (the Research Excellence Framework - 2014). We provide a performance comparison of a transformer-based ensemble, which obtains multiple predictions for a research paper, given its multiple textual fields (e.g. title, abstract, reference), with traditional machine learning models. The ensemble involves enriching the initial data with additional information from open-access digital libraries and Argumentative Zoning techniques (CITATION). It uses a weighted sum aggregation for the multiple predictions to obtain a final single prediction for the given research paper. Both data and the ensemble are publicly available on https://www.kaggle.com/competitions/sdp2022-scholarly-knowledge-graph-generation/data?select=task1{\_}test{\_}no{\_}label.csv and https://github.com/ProjectDoSSIER/sdp2022, respectively.
en
dc.description.sponsorship
European Commission
-
dc.language.iso
en
-
dc.relation.ispartofseries
International conference on computational linguistics
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Scholarly Document Processing
en
dc.subject
Argumentative Zoning
en
dc.subject
Document Classification
en
dc.subject
Research Theme Identification
en
dc.title
Benchmark for Research Theme Classification of Scholarly Documents
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.rights.license
Urheberrechtsschutz
de
dc.rights.license
In Copyright
en
dc.identifier.doi
10.34726/4521
-
dc.contributor.affiliation
University of Milano-Bicocca, Italy
-
dc.contributor.affiliation
IRIS.ai, Stabekk, Norway
-
dc.contributor.affiliation
The Open University, United Kingdom of Great Britain and Northern Ireland (the)
-
dc.contributor.affiliation
The Open University, United Kingdom of Great Britain and Northern Ireland (the)
-
dc.contributor.affiliation
Elsevier Inc., US
-
dc.contributor.affiliation
University of Milano-Bicocca, Italy
-
dc.relation.issn
2951-2093
-
dc.description.startpage
253
-
dc.description.endpage
262
-
dc.relation.grantno
860721
-
dc.rights.holder
Authors
-
dc.type.category
Full-Paper Contribution
-
tuw.booktitle
Proceedings of the Workshop. Third Workshop on Scholarly Document Processing
-
tuw.container.volume
29
-
tuw.peerreviewed
true
-
tuw.book.ispartofseries
International conference on computational linguistics
-
tuw.relation.publisher
Association for Computational Linguistics
-
tuw.relation.publisherplace
Gyeongju, Republic of Korea
-
tuw.project.title
Domänen-spezifische Systeme für Informationsextraktion und -suche
-
tuw.researchTopic.id
I4
-
tuw.researchTopic.name
Information Systems Engineering
-
tuw.researchTopic.value
100
-
tuw.linking
https://github.com/ProjectDoSSIER/sdp2022
-
tuw.linking
https://aclanthology.org/2022.sdp-1.31/
-
tuw.publication.orgunit
E194-04 - Forschungsbereich Data Science
-
tuw.publication.orgunit
E194 - Institut für Information Systems Engineering
-
dc.identifier.libraryid
AC17204477
-
dc.description.numberOfPages
10
-
tuw.author.orcid
0000-0003-4420-4147
-
tuw.author.orcid
0000-0002-7162-7252
-
tuw.author.orcid
0000-0003-1161-7359
-
tuw.author.orcid
0000-0002-2730-1546
-
tuw.author.orcid
0000-0001-7584-6439
-
tuw.author.orcid
0000-0002-6080-8170
-
tuw.author.orcid
0000-0002-7149-5843
-
dc.rights.identifier
Urheberrechtsschutz
de
dc.rights.identifier
In Copyright
en
tuw.event.name
International Conference On Computational Linguistics
en
tuw.event.startdate
12-10-2022
-
tuw.event.enddate
17-10-2022
-
tuw.event.online
Hybrid
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Gyeongju
-
tuw.event.country
KR
-
tuw.event.presenter
Mendoza, Oscar
-
tuw.event.presenter
Kusa, Wojciech
-
tuw.event.presenter
Wu, Ronin
-
tuw.event.presenter
Knoth, Petr
-
tuw.event.track
Multi Track
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Wirtschaftswissenschaften
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
5020
-
wb.sciencebranch.value
90
-
wb.sciencebranch.value
10
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
item.openaccessfulltext
Open Access
-
item.openairetype
conference paper
-
item.fulltext
with Fulltext
-
item.mimetype
application/pdf
-
item.languageiso639-1
en
-
item.grantfulltext
open
-
item.cerifentitytype
Publications
-
crisitem.author.dept
University of Milano-Bicocca
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.dept
E194-01 - Forschungsbereich Software Engineering
-
crisitem.author.dept
IRIS.ai, Stabekk, Norway
-
crisitem.author.dept
The Open University
-
crisitem.author.dept
The Open University
-
crisitem.author.dept
Elsevier Inc., US
-
crisitem.author.dept
E058-06 - Fachbereich Zentrum für Forschungsdatenmanagement
-
crisitem.author.dept
University of Milano-Bicocca
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.orcid
0000-0003-4420-4147
-
crisitem.author.orcid
0000-0002-7162-7252
-
crisitem.author.orcid
0000-0003-1161-7359
-
crisitem.author.orcid
0000-0002-2730-1546
-
crisitem.author.orcid
0000-0001-7584-6439
-
crisitem.author.orcid
0000-0002-6080-8170
-
crisitem.author.orcid
0000-0002-7149-5843
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E058 - Forschungs-, Technologie- und Innovationssupport
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering