UMLS for information extraction

Kohler, Michael

DC Field

Value

Language

dc.contributor.advisor

Miksch, Silvia

dc.contributor.author

Kohler, Michael

dc.date.accessioned

2020-06-30T22:34:11Z

dc.date.issued

2007

dc.date.submitted

2007-05

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Kohler, M. (2007). <i>UMLS for information extraction</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-18958</div> </div>

dc.identifier.uri

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-18958

dc.identifier.uri

http://hdl.handle.net/20.500.12708/14604

dc.description

Zsfassung in dt. Sprache

dc.description.abstract

Die weltweite Flut an Informationen erfordert Entwicklung von Tools, um Informationen zu filtern und zu komprimieren. Information Extraction (IE) ist ein Teilbereich von Natural Language Processing (NLP) und wird verwendet, um Informationen aus Texten zu extrahieren und eine Datenbank zu füllen. IE Systeme müssen jedoch auf eine bestimmte Domäne spezialisiert werden und nur dann sind sie in der Lage Texte von dieser Domäne zu verarbeiten. IE Systeme setzen sich unter anderem aus Terminologien, Ontologien und Vokabularen zusammen.<br />Das UMLS System besteht aus vielen Wörterbüchern, Thesauri, Terminologien und Ontologien, die dargestellt werden durch SPECIALIST Lexicon, Metathesaurus und Semantic Network. Das UMLS ist eine gigantische Wissensbasis, welches eine Vielzahl von medizinischen Themen umfasst. Durch die Größe von UMLS ist es schwierig Information zu extrahiern und auch die korrekte Zuweisung von Phrasen zu Konzepten ist nicht einfach.<br />Mit der Hilfe von MetaMap Transfer (MMTx) kann dieses Zuweisungsproblem ausgelagert werden. Das UMLSint package wurde entwickelt, um den Zugang zu den UMLS-Daten zu vereinfachen, die interessanten Attribute zu extrahieren und die Eingabedaten zu analysieren, um die betreffenden Konzepte in der Knowledge Base zu finden. Die Eingabe besteht aus einem Satz eines medizinischen Textes und UMLSint liefert mittels UMLS und MMTx ausgewählte Daten zurück. Dieses MMTx Werkzeug wird dazu benutzt, um logische Einheiten zu erzeugen und Informationen über die lexikalische und morphologische Struktur zu generieren.<br />Für jede logische Einheit werden verschiedene Informationen, wie semantische Art, Begriffsart, Wortart, Metathesaurus Konzept ID und vieles mehr, zurückgeliefert.<br />Das Thema dieser Arbeit ist es, IE Systemen, welche medizinische Texte verarbeiten, einen leichteren Zugang zur Knowledge Base zu verschaffen, welche hier das UMLS System darstellen.<br />

dc.description.abstract

The enormous growth of the world wide flood of information makes it more and more impor-tant to use effective tools to extract and condense key information. There are ongoing re-searches in the branch of Natural Language Processing (NLP). Information Extraction (IE) is a section of NLP and is used to extract information from text to fill a database. However, there are limitations in the use of IE. The IE systems need to be specialised on a specific domain and therefore they are only able to handle text from an indicated domain. IE systems are con-sisting of several components, one of the important components may be composed of termi-nologies, ontologies, and vocabularies.<br />The UMLS combines a huge variety of source vocabularies, terminologies, and ontologies to the SPECIALIST lexicon, the Metathesaurus, and the Semantic Network. The UMLS is a gi-gantic knowledge base, which covers numerous themes in medicine.<br />Due the large size of umls, it is difficult to extract information. Also matching concepts to phrases is not an easy task. With the help of MMTx the matching problem can be outsourced.<br />To break down the complex data structure of UMLS and MMTx, a more simple and easy ac-cessible data structure was introduced, which is part of the UMLSint package. The UMLSint package was developed to simplify the access to the UMLS data, to extract the attributes, which are of interest, and to analyse the input data to find the referring concepts in the knowl-edge base. The UMLSint package gets as an input a sentence of medical text and returns at-tributes of interest from the UMLS in accordance to questioned phrase. The information con-sists of factual knowledge from the Metathesaurus and information generated by the MetaMap Transfer (MMTx) tool. The MMTx tool is used to create logical elements and gather informa-tion about the lexical and morphological structure. For each logical element various information is now accessible, such as semantic type, term type, Part-Of-Speech tag, Metathesaurus concept ID, and many more. This information can be used for both NLP and IE systems for further analysis of the text.<br />The subject of this thesis is to enable IE systems, which process medical text, an easier access to the knowledge base named Unified Medical Language System (UMLS).<br />

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Information Extraktion

dc.subject

Interface

dc.subject

Unified Medical Language System

dc.subject

UMLS

dc.subject

Natural Language Processing

dc.subject

NLP

dc.subject

UMLSint

dc.subject

UMLS interface

dc.subject

Information Extraction

dc.subject

Interface

dc.subject

Unified Medical Language System

dc.subject

UMLS

dc.subject

Natural Language Processing

dc.subject

NLP

dc.subject

UMLSint

dc.subject

UMLS interface

dc.title

UMLS for information extraction

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Michael Kohler

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Kaiser, Katharina

tuw.publication.orgunit

E188 - Institut für Softwaretechnik und Interaktive Systeme

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC05034616

dc.description.numberOfPages

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-18958

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.assistant.staffStatus

staff

tuw.advisor.orcid

0000-0003-4427-5703

item.languageiso639-1

item.openairetype

master thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

crisitem.author.dept

TU Wien

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(1.92 MB)

In Copyright

Show simple item record

Page view(s)

398

checked on Nov 23, 2023

Download(s)

127

checked on Nov 23, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM