Rule-based open information extraction from German legal domain

Iszak, Zsombor

doi:10.34726/hss.2025.126169

Record link:

https://doi.org/10.34726/hss.2025.126169
http://hdl.handle.net/20.500.12708/213079

Title:

Rule-based open information extraction from German legal domain

Citation:

Iszak, Z. (2025). Rule-based open information extraction from German legal domain [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.126169

reposiTUm DOI:

10.34726/hss.2025.126169

CatalogPlus:

AC17462153

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Iszak, Zsombor

Advisor:

Recski, Gábor

Organisational Unit:

E194 - Institut für Information Systems Engineering

Date (published):

2025

Number of Pages:

Keywords:

Information Extraction; Rule-based; Open Information Extraction; Legal domain; German text; Generalization; Ruleset

Abstract:

Historically, information has been stored in written form as text. Text is generally an unstructured form of information that, while suitable for human use, is not the most efficient way to extract information by machines due to its unstructured state. The thesis is based on a business case, which aims to evaluate the solutions of legal students and provide feedback on errors. Evaluation guidelines are provided by domain experts who know how to evaluate the attempts and what information is relevant. A rule-based Open Information Extraction (OIE) system is designed to extract information segments from the student attempts. OIE derives structured information from unstructured text, unrestricted by relation type. Rule-based OIE is based on the combination of a set of rules and a matching algorithm and provides explainability, enabling transparent decision making that is critical to both the legal domain and the business case at hand. A strictly defined set of rules leads to an explainable information extraction on the target domain but limits the ability of the model to generalize if the vocabulary or phrasing is changed. The scientific aim is to investigate the generalization capability of the Universal Dependency (UD) graph-based rule systems over legal texts belonging to diverse legal cases.In this study, a set of rules is created with a combination of a matching algorithm, that works perfectly on the target legal case achieving both a recall and precision of 1. Starting from this highly case-specific combination, generalization steps will be taken to find the optimal combination of rules and matching algorithm, that can perform well not only on legal texts belonging to the target case but also on those belonging to different legal cases. As part of the generalization steps, both the matching algorithm and the initial set of rules will be adjusted. The generalization ability is evaluated both qualitatively and quantitatively. The specified domain poses challenges unique to the legal field. The implications of these are discussed and potential solutions are proposed outlining future work.

License:

In Copyright

Appears in Collections:

Thesis