DC FieldValueLanguage
dc.contributor.advisorHanbury, Allan-
dc.contributor.authorSchiegl, Adrian-
dc.date.accessioned2021-06-24T05:44:26Z-
dc.date.issued2021-
dc.date.submitted2021-06-
dc.identifier.citation<div class="csl-bib-body"> <div class="csl-entry">Schiegl, A. (2021). <i>Disease-Symptom relation extraction from medical text corpora with BERT</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2021.77705</div> </div>-
dc.identifier.urihttps://doi.org/10.34726/hss.2021.77705-
dc.identifier.urihttp://hdl.handle.net/20.500.12708/17874-
dc.descriptionArbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft-
dc.descriptionAbweichender Titel nach Übersetzung der Verfasserin/des Verfassers-
dc.description.abstractTo this day vast amounts of medical knowledge is still published in unstructured form e.g., case reports, clinical notes etc. The automated extraction of relations from unstructured sources between symptoms, diseases and other patient related information plays an important role in areas such as Evidence Based Medicine. For example, effective disease- symptom relation extraction accelerates tasks such as reviewing large amounts of medical literature to learn new disease characteristics.In this work we present a relation extraction model based on BERT and MetaMap that extracts disease-symptom relations from over 20,000 BMJ Case Reports. Case reports are medical publications that contain clinically important information about the course of patients with specific medical conditions. Our model exploits the fact that a case report focuses on a single disease which is mentioned in the case report title. By doing so we represent the problem of relation extraction as a named entity recognition problem, which simplifies the model and the annotation of the training dataset.We evaluate our model using the Disease Symptom Relation Collection (DSR). DSR is a set of graded disease-symptom relations from 20 diseases which was curated by medical doctors. We evaluate our model by measuring the relevance of the disease-symptom relations it extracted from BMJ Case Reports. We measure relevance by calculating the agreement with the ground truth provided by the medical doctors with the metrics nDCG@k, precision@k and recall@k. Furthermore, we compare the relevance our model achieved with the relevance of two baseline models: a word2vec model and a co-occurrence model trained on 1.5 million PubMed Central articles.Our results show that our approach outperforms baselines by up to 25% nDCG, 27% precision and 10% recall. The agreement between our model and the ground truth is up to 64% nDCG@5 and 66% precision@5. Furthermore, our results also show that case reports are a high quality source of disease-symptom relations. Despite that, we find that they are of limited use due to the small number of openly accessible case reports.en
dc.format68 Seiten-
dc.languageEnglish-
dc.language.isoen-
dc.subjectNLPen
dc.subjectBiomedical Relation Extractionen
dc.subjectInformation Retrievalen
dc.subjectText Miningen
dc.titleDisease-Symptom relation extraction from medical text corpora with BERTen
dc.title.alternativeKrankheit-Symptom Relationsextraktion aus medizinischen Texten mit BERTde
dc.typeThesisen
dc.typeHochschulschriftde
dc.identifier.doi10.34726/hss.2021.77705-
dc.publisher.placeWien-
tuw.thesisinformationTechnische Universität Wien-
dc.contributor.assistantZlabinger, Markus-
tuw.publication.orgunitE194 - Institut für Information Systems Engineering-
dc.type.qualificationlevelDiploma-
dc.identifier.libraryidAC16235613-
dc.description.numberOfPages68-
dc.thesistypeDiplomarbeitde
dc.thesistypeDiploma Thesisen
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
item.openaccessfulltextOpen Access-
item.openairetypeThesis-
item.openairetypeHochschulschrift-
item.fulltextwith Fulltext-
item.languageiso639-1en-
item.grantfulltextopen-
item.cerifentitytypePublications-
item.cerifentitytypePublications-
Appears in Collections:Thesis

Files in this item:


Page view(s)

53
checked on Sep 17, 2021

Download(s)

48
checked on Sep 17, 2021

Google ScholarTM

Check


Items in reposiTUm are protected by copyright, with all rights reserved, unless otherwise indicated.