Ningtyas, A. M., El-Ebshihy, A., Herwanto, G. B., Piroi, F., & Hanbury, A. (2022). Leveraging Wikipedia Knowledge for Distant Supervision in Medical Concept Normalization. In Experimental IR Meets Multilinguality, Multimodality, and Interaction (pp. 33–47). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-031-13643-6_3
13th International Conference of the CLEF Association, CLEF 2022
en
Event date:
5-Sep-2022 - 8-Sep-2022
-
Event place:
Bologna, Italy
-
Number of Pages:
15
-
Publisher:
Springer Nature Switzerland AG, Cham, Switzerland
-
Peer reviewed:
Yes
-
Keywords:
Wikipedia; Distant Supervision; Medical Concept Normalization
en
Abstract:
The majority of recent research has approached the Medical Concept Normalization (MCN) task as supervised text classification. However, combining all of the currently available training datasets for this task (CADEC, PsyTAR, COMETA) only covers a small fraction of the concepts contained in the Systematized Nomenclature of Medical-Clinical Terms (SNOMED-CT). In this work, we propose a distant supervision approach to broaden the training data coverage of the SNOMED-CT concepts by tapping into Wikipedia as a source of informal medical phrases. Based on our observations, components of Wikipedia articles (article summaries, Wikipedia’s redirect pages, wikilinks data) contain informal medical terms that can be generalized to those used in social media posts. We extract the article summaries, Wikipedia’s redirect pages, and wikilinks data from the Wikipedia articles relating to medical information. We pair this data with corresponding SNOMED-CT concepts. Our distant supervision approach was able to double the concept coverage from the public MCN data sets. Our experiments show that the proposed distant supervision data approach improved the model performance on the three publicly available MCN datasets.
en
Project (external):
Austrian Agency for International Cooperation in Education and Research (OeAD-GmbH) Indonesian Ministry of Education and Culture (KEMDIKBUD)