<div class="csl-bib-body">
<div class="csl-entry">Iszak, Z. (2025). <i>Open Information Extraction from German legal text</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.126169</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2025.126169
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/213079
-
dc.description
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft
-
dc.description
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
-
dc.description.abstract
Historically, information has been stored in written form as text. Text is generally an unstructured form of information that, while suitable for human use, is not the most efficient way to extract information by machines due to its unstructured state. The thesis is based on a business case, which aims to evaluate the solutions of legal students and provide feedback on errors. Evaluation guidelines are provided by domain experts who know how to evaluate the attempts and what information is relevant. A rule-based Open Information Extraction (OIE) system is designed to extract information segments from the student attempts. OIE derives structured information from unstructured text, unrestricted by relation type. Rule-based OIE is based on the combination of a set of rules and a matching algorithm and provides explainability, enabling transparent decision making that is critical to both the legal domain and the business case at hand. A strictly defined set of rules leads to an explainable information extraction on the target domain but limits the ability of the model to generalize if the vocabulary or phrasing is changed. The scientific aim is to investigate the generalization capability of the Universal Dependency (UD) graph-based rule systems over legal texts belonging to diverse legal cases.In this study, a set of rules is created with a combination of a matching algorithm, that works perfectly on the target legal case achieving both a recall and precision of 1. Starting from this highly case-specific combination, generalization steps will be taken to find the optimal combination of rules and matching algorithm, that can perform well not only on legal texts belonging to the target case but also on those belonging to different legal cases. As part of the generalization steps, both the matching algorithm and the initial set of rules will be adjusted. The generalization ability is evaluated both qualitatively and quantitatively. The specified domain poses challenges unique to the legal field. The implications of these are discussed and potential solutions are proposed outlining future work.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Information Extraction
en
dc.subject
Rule-based
en
dc.subject
Open Information Extraction
en
dc.subject
Legal domain
en
dc.subject
German text
en
dc.subject
Generalization
en
dc.subject
Ruleset
en
dc.title
Open Information Extraction from German legal text
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2025.126169
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Zsombor Iszak
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
tuw.publication.orgunit
E194 - Institut für Information Systems Engineering