Iszak, Z. (2025). Open Information Extraction from German legal text [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.126169
E194 - Institut für Information Systems Engineering
-
Date (published):
2025
-
Number of Pages:
66
-
Keywords:
Information Extraction; Rule-based; Open Information Extraction; Legal domain; German text; Generalization; Ruleset
en
Abstract:
Historically, information has been stored in written form as text. Text is generally an unstructured form of information that, while suitable for human use, is not the most efficient way to extract information by machines due to its unstructured state. The thesis is based on a business case, which aims to evaluate the solutions of legal students and provide feedback on errors. Evaluation guidelines are provided by domain experts who know how to evaluate the attempts and what information is relevant. A rule-based Open Information Extraction (OIE) system is designed to extract information segments from the student attempts. OIE derives structured information from unstructured text, unrestricted by relation type. Rule-based OIE is based on the combination of a set of rules and a matching algorithm and provides explainability, enabling transparent decision making that is critical to both the legal domain and the business case at hand. A strictly defined set of rules leads to an explainable information extraction on the target domain but limits the ability of the model to generalize if the vocabulary or phrasing is changed. The scientific aim is to investigate the generalization capability of the Universal Dependency (UD) graph-based rule systems over legal texts belonging to diverse legal cases.In this study, a set of rules is created with a combination of a matching algorithm, that works perfectly on the target legal case achieving both a recall and precision of 1. Starting from this highly case-specific combination, generalization steps will be taken to find the optimal combination of rules and matching algorithm, that can perform well not only on legal texts belonging to the target case but also on those belonging to different legal cases. As part of the generalization steps, both the matching algorithm and the initial set of rules will be adjusted. The generalization ability is evaluated both qualitatively and quantitatively. The specified domain poses challenges unique to the legal field. The implications of these are discussed and potential solutions are proposed outlining future work.
en
Additional information:
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers