<div class="csl-bib-body">
<div class="csl-entry">Ningtyas, A. M., El-Ebshihy, A., Herwanto, G. B., Piroi, F., & Hanbury, A. (2022). Leveraging Wikipedia Knowledge for Distant Supervision in Medical Concept Normalization. In <i>Experimental IR Meets Multilinguality, Multimodality, and Interaction</i> (pp. 33–47). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-031-13643-6_3</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/142559
-
dc.description.abstract
The majority of recent research has approached the Medical Concept Normalization (MCN) task as supervised text classification. However, combining all of the currently available training datasets for this task (CADEC, PsyTAR, COMETA) only covers a small fraction of the concepts contained in the Systematized Nomenclature of Medical-Clinical Terms (SNOMED-CT). In this work, we propose a distant supervision approach to broaden the training data coverage of the SNOMED-CT concepts by tapping into Wikipedia as a source of informal medical phrases. Based on our observations, components of Wikipedia articles (article summaries, Wikipedia’s redirect pages, wikilinks data) contain informal medical terms that can be generalized to those used in social media posts. We extract the article summaries, Wikipedia’s redirect pages, and wikilinks data from the Wikipedia articles relating to medical information. We pair this data with corresponding SNOMED-CT concepts. Our distant supervision approach was able to double the concept coverage from the public MCN data sets. Our experiments show that the proposed distant supervision data approach improved the model performance on the three publicly available MCN datasets.
en
dc.language.iso
en
-
dc.relation.ispartofseries
Lecture Notes in Computer Science
-
dc.subject
Wikipedia
en
dc.subject
Distant Supervision
en
dc.subject
Medical Concept Normalization
en
dc.title
Leveraging Wikipedia Knowledge for Distant Supervision in Medical Concept Normalization
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.affiliation
Universitas Gadjah Mada, Indonesia
-
dc.relation.isbn
978-3-031-13643-6
-
dc.relation.doi
10.1007/978-3-031-13643-6
-
dc.description.startpage
33
-
dc.description.endpage
47
-
dc.type.category
Full-Paper Contribution
-
dc.relation.eissn
1611-3349
-
tuw.booktitle
Experimental IR Meets Multilinguality, Multimodality, and Interaction
-
tuw.container.volume
13390
-
tuw.peerreviewed
true
-
tuw.relation.publisher
Springer Nature Switzerland AG
-
tuw.relation.publisherplace
Cham, Switzerland
-
tuw.researchTopic.id
C5
-
tuw.researchTopic.name
Computer Science Foundations
-
tuw.researchTopic.value
100
-
tuw.publication.orgunit
E194-04 - Forschungsbereich Data Science
-
tuw.publisher.doi
10.1007/978-3-031-13643-6_3
-
dc.description.numberOfPages
15
-
tuw.author.orcid
0000-0001-7584-6439
-
tuw.author.orcid
0000-0002-7149-5843
-
tuw.event.name
13th International Conference of the CLEF Association, CLEF 2022
en
dc.description.sponsorshipexternal
Austrian Agency for International Cooperation in Education and Research (OeAD-GmbH)
-
dc.description.sponsorshipexternal
Indonesian Ministry of Education and Culture (KEMDIKBUD)
-
dc.relation.grantnoexternal
ICM-2019-13880
-
tuw.event.startdate
05-09-2022
-
tuw.event.enddate
08-09-2022
-
tuw.event.online
On Site
-
tuw.event.type
Event for scientific audience
-
tuw.event.place
Bologna
-
tuw.event.country
IT
-
tuw.event.presenter
Ningtyas, Annisa Maulida
-
wb.sciencebranch
Informatik
-
wb.sciencebranch
Wirtschaftswissenschaften
-
wb.sciencebranch.oefos
1020
-
wb.sciencebranch.oefos
5020
-
wb.sciencebranch.value
90
-
wb.sciencebranch.value
10
-
item.languageiso639-1
en
-
item.openairetype
conference paper
-
item.grantfulltext
restricted
-
item.fulltext
no Fulltext
-
item.cerifentitytype
Publications
-
item.openairecristype
http://purl.org/coar/resource_type/c_5794
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.dept
E194-01 - Forschungsbereich Software Engineering
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.dept
E058-06 - Fachbereich Zentrum für Forschungsdatenmanagement
-
crisitem.author.dept
E194-04 - Forschungsbereich Data Science
-
crisitem.author.orcid
0000-0001-7584-6439
-
crisitem.author.orcid
0000-0002-7149-5843
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering
-
crisitem.author.parentorg
E058 - Forschungs-, Technologie- und Innovationssupport
-
crisitem.author.parentorg
E194 - Institut für Information Systems Engineering