Automatische Identifizierung und Annotierung von Negation in Texten

Klein, Alexandra

DC Field

Value

Language

dc.contributor.advisor

Trost, Harald

dc.contributor.author

Klein, Alexandra

dc.date.accessioned

2020-06-30T02:45:01Z

dc.date.issued

2013

dc.date.submitted

2013-05

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Klein, A. (2013). <i>Automatische Identifizierung und Annotierung von Negation in Texten</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-59771</div> </div>

dc.identifier.uri

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-59771

dc.identifier.uri

http://hdl.handle.net/20.500.12708/9979

dc.description

Abweichender Titel laut Übersetzung der Verfasserin/des Verfassers

dc.description

Zsfassung in engl. Sprache

dc.description.abstract

Bei der Entwicklung von Verfahren zur Informationsextraktion, zum Information Retrieval und zum Text Mining wird Negation, also die Möglichkeit, dass im Text vorkommende Begriffe oder Zusammenhänge verneint werden, bislang kaum beachtet. Objekte und Relationen zwischen Objekten können zwar in vielen Fällen domänenabhängig und auf der Basis von Hintergrundwissen korrekt identifiziert und kategorisiert werden. In der Informationsextraktion, bei der es zumeist darum geht, Templates zu füllen, werden auch größere Zusammenhänge in Texten gefunden, sofern sie mit den vorgegebenen Templates korrespondieren. Beinahe all diesen Ansätzen liegt jedoch eine vereinfachende Sicht der Information zugrunde: Begriffe, Konzepte, Relationen und Zusammenhänge werden als in Texten entweder vorhanden oder nicht vorhanden betrachtet. Sind sie vorhanden, so werden sie weiterverarbeitet; sind sie nicht vorhanden, so wird der Text oder Textabschnitt ignoriert. Dass ein wichtiger Sachverhalt durch Negation ausgedrückt werden kann, wird dabei zumeist ausgeklammert. Das führt dazu, dass alle vorkommenden Objekte und Relationen auf dieselbe Weise verarbeitet werden, auch wenn einige von ihnen negiert in Texten auftreten. Als Resultat werden Informationen präsentiert oder weiterverarbeitet, die eben nicht zutreffend sind.<br />Die wenigen Ansätze, die Negation einbeziehen, weitgehend im medizinischen Bereich und in der Sentiment-Analyse, konzentrieren sich bisher aufgrund ihrer spezifischen Anwendung zumeist auf eine extrem eingeschränkte Sicht der Bedeutung und somit des Auftretens von Negation. Sätze oder Teilsätze, die negierte Bestandteile enthalten, werden in der Verarbeitung ausgefiltert. Es wird in der Regel nicht versucht, Art und Skopus der Negation genauer zu bestimmen und im Text zu annotieren. Gerade diese beiden Aspekte beschäftigen jedoch traditionell Linguistik und Sprachphilosophie, sodass zu Bedeutung und Realisierung von Negation vielfältige theoretische und empirische Erkenntnisse vorliegen, die bisher nicht in der maschinellen Verarbeitung genutzt werden, die aber Hinweise bei der automatischen Identifikation von Negation geben können. Diese Arbeit beschreibt ein Modul zur automatischen Identifizierung und Annotierung von Negation und ihren Kontexten und Funktionen in Texten.<br />Die Einbeziehung von negierter Information soll zu einer differenzierteren Sicht von Textinhalten beitragen, was bessere Ergebnisse in Anwendungen wie Information Retrieval, Informationsextraktion und Text Mining erwarten lässt. Als Beispieldomäne werden deutschsprachige Zeitungstexte gewählt. Ausgehend von einer Anwendung, bei der mit frei formulierten natürlichsprachlichen Äußerungen nach passenden Zeitungstexten in einem Korpus gesucht werden kann, wurde zunächst eine empirische Studie durchgeführt, bei der Negation, ihre Kontexte und Funktionen manuell annotiert wurden.<br />Ausgehend von dieser Analyse wurde eine Klassifikation der Negation und ihrer Funktionen entwickelt. Diese Klassifikation wiederum dient als Basis für ein Modul zur Identifikation und Klassifikation von Negation.<br />Die Ergebnisse der automatischen Analyse wurden auf zwei Korpora evaluiert. Es zeigt sich, dass ein hoher Anteil der vorkommenden Negationen auch in Texten, die vorher nur maschinell vorverarbeitet und nicht manuell bearbeitet worden sind, korrekt erkannt und klassifiziert werden.

dc.description.abstract

Negation, i.e. the possibility that terms or relations occuring in text may be negated, is usually not considered in developing methods and techniques for Information Extraction, Information Retrieval and Text Mining. In many cases, objects and relations between objects can be identified and classified correctly, given a specific domain and background knowledge. In Information Extraction, which is often about filling templates, larger contexts may be found, as long as they correspond to the specified templates. However, most of these approaches rely on a simplified view of information: terms, concepts, relations and templates are treated as either present or not present in texts. If they are present, they will be used for further processing; if they are not present, the text or part of the text is ignored. Usually, developers neglect that important facts may be expressed in terms of a negated statement. Thus, all objects and relations are treated in the same way, even if some of them occur in the context of a negation. As a result, incorrect information may be extracted and passed on to further processing. Few approaches have started to consider negation, mostly in the medical domain and in the field of sentiment analysis. They tend to concentrate on a specific application, with a limited view of negation and its functions. Sentences or parts of sentences which contain negation are not considered during analysis. It is usually not attempted to classify and annotate the type and scope of negation. These two aspects have traditionally received much attention in linguistics and language philosophy. The automatic analysis of negation might benefit from theoretical research and empirical evidence concerning the meaning and realization of negation, but so far it has not been incorporated into systems.<br />This dissertation describes a module for identifying and annotating negation and its textual contexts and functions. It is the aim to closely examine textual content by including an analysis of negation, which should lead to better results in applications such as Information Retrieval, Information Extraction and Text Mining. German newspaper texts serve as a sample domain. Starting from an application where newspaper texts are retrieved from a corpus based on natural-language queries, an empirical corpus study was carried out where negation contexts and functions were manually annotated. Based on this analysis, a classification of negation and its functions was derived. This classification serves as a foundation for a module which automatically identifies and classifies negation. The results of the analysis were evaluated on two corpus resources. It turns out that a high proportion of negations can be correctly identified and classified, even in texts which have been preprocessed automatically, without any manual annotations.

dc.language

Deutsch

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Negation

dc.subject

Computerlinguistik

dc.subject

Informationsextraktion

dc.subject

Text Mining

dc.subject

Negation

dc.subject

Computational Linguistics

dc.subject

Information Extraction

dc.subject

Text Mining

dc.title

Automatische Identifizierung und Annotierung von Negation in Texten

dc.title.alternative

Automatic identification and annotation of negation in texts

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Alexandra Klein

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

dc.contributor.assistant

Rauber, Andreas

tuw.publication.orgunit

E188 - Institute of Software Technology and Interactive Systems, Information and Software Engineering Group ; Öst. Forschungsinstitut für Artificial Intelligence ; Zentrum für med. Statistik, Informatik und Intelligente Systeme, Medizinische Universität Wien

dc.type.qualificationlevel

Doctoral

dc.identifier.libraryid

AC10774684

dc.description.numberOfPages

126

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-59771

dc.thesistype

Dissertation

dc.thesistype

Dissertation

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.assistant.orcid

0000-0002-9272-6225

item.languageiso639-1

item.openairetype

doctoral thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_db06

item.openaccessfulltext

Open Access

crisitem.author.dept

TU Wien

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(497.79 kB)

In Copyright

Show simple item record

Page view(s)

621

checked on Nov 21, 2023

Download(s)

259

checked on Nov 21, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM