Patent claim decomposition for improved information extraction

Parapatics, Peter

DC Field

Value

Language

dc.contributor.advisor

Rauber, Andreas

dc.contributor.author

Parapatics, Peter

dc.date.accessioned

2020-06-30T09:39:14Z

dc.date.issued

2009

dc.date.submitted

2009-12

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Parapatics, P. (2009). <i>Patent claim decomposition for improved information extraction</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-33260</div> </div>

dc.identifier.uri

https://resolver.obvsg.at/urn:nbn:at:at-ubtuw:1-33260

dc.identifier.uri

http://hdl.handle.net/20.500.12708/11776

dc.description

Zsfassung in dt. Sprache

dc.description.abstract

Die vorliegende Arbeit beschreibt eine regelbasierte Methode zur Zerlegung von englischsprachigen Patentansprüchen in kleinere Teile mit dem Ziel, eine Basis für weitere Textanalyseschritte zu schaffen und die Anwendbarkeit von existierenden Algorithmen zur Informationsextraktion zu vereinfachen, welche auf Grund des komplizierten sprachlichen Aufbaus von Patentansprüchen nur beschränkt für diese geeignet sind. Da Patentansprüche nach sehr genauen syntaktischen und semantischen Vorgaben verfasst werden müssen, enthalten sie eine Reihe von wiederkehrenden grammatikalischen Mustern, die mittels linguistischer Analyse gefunden und extrahiert werden können. Die extrahierten Teile werden in eine Baumstruktur gebracht und es wird ein Algorithmus vorgestellt, der diese Teile reorganisiert und graphisch darstellt, um die Lesbarkeit der Patentansprüche zu verbessern. Die Evaluierung der Methode zeigt, dass die Länge und Komplexität von Patentansprüchen durch die Anwendung der entwickelten Regeln stark reduziert werden kann und dass dadurch die Anwendbarkeit von existierenden Information Extraction Tools erleichtert wird.

dc.description.abstract

Natural language processing algorithms and information extraction methods have proven to be valuable tools supporting humans in structuring, aggregating and managing large amounts of information, available as text, in several domains. Patent claims, although subject to a number of rigid constraints and therefore pressed into foreseeable structures, are written in a very domain-specific and almost artificial language common information extraction and retrieval methods tend to show poor performance on. This work presents a rule-based approach for decomposing patent claims into smaller parts for providing a basis for further analysis. As claims are drafted according to very precise syntactic and semantic rules, they contain a high number of reoccurring grammatical patterns. A set of rules based on linguistic analysis is used to identify and extract these patterns. The extracted claim parts are organized in a tree structure in order to retain the information on how they are related to each other. An algorithm is proposed for automatically reorganizing and then visualizing this tree structure for improving readability of claims. The evaluation of the method shows that rule-based patent claim decomposition is feasible and provides promising results in terms of reduction of length and complexity of patent claims.<br />It shows that the decomposition method can be used to ease the application and raise the performance of existing information extraction tools.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Patentansprüche

dc.subject

Information Extraction

dc.subject

Natural Language Processing

dc.subject

Patente

dc.subject

regelbasierter Ansatz

dc.subject

information extraction

dc.subject

patent claims

dc.subject

natural language processing

dc.subject

patents

dc.subject

claim decomposition

dc.title

Patent claim decomposition for improved information extraction

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Peter Parapatics

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E188 - Institut für Softwaretechnik und Interaktive Systeme

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC07806370

dc.description.numberOfPages

120

dc.identifier.urn

urn:nbn:at:at-ubtuw:1-33260

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.orcid

0000-0002-9272-6225

item.languageiso639-1

item.openairetype

master thesis

item.grantfulltext

open

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.mimetype

application/pdf

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.openaccessfulltext

Open Access

crisitem.author.dept

TU Wien

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(3.52 MB)

In Copyright

Show simple item record

Google Scholar^TM

Check

Google ScholarTM

Google Scholar^TM