<div class="csl-bib-body">
<div class="csl-entry">Csakvari, T. R. (2025). <i>Large Language Model-based framework for Open Information Extraction</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.131626</div>
</div>
-
dc.identifier.uri
https://doi.org/10.34726/hss.2025.131626
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/220264
-
dc.description
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft
-
dc.description
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers
-
dc.description.abstract
The growth of unstructured digital text demands effective knowledge extraction methods. While traditional Information Extraction is limited by rigid schemas, Open Information Extraction (OIE) provides needed flexibility. Large Language Models (LLMs) show promise for OIE but their application to both OIE and semantic triplet matching remains underexplored.This thesis introduces and evaluates a novel, modular LLM-based framework designed for OIE, subsequent semantic triplet matching, and text comparison, with validation performed on a German legal education dataset of student responses. The framework employs LLMs to first extract (subject, relation, object) triplets from the German legal texts. These extracted candidate triplets are then semantically compared against predefined target triplets (representing key legal contents) using an LLM-based triplet matching process. The system's performance was quantitatively and qualitatively evaluated on the dataset of student answers to a specific legal case, comparing LLM-based triplet matching outputs against human-annotated ground truth. Several state-of-the-art LLMs (including GPT-4 series, Llama, DeepSeek) were benchmarked, alongside alternative methods such as end-to-end LLM evaluation, rule-based OIE, and string-based triplet matching for comparison.Results demonstrate the framework's considerable proficiency, with the top-performing configuration (GPT-4.1-mini for both OIE and triplet matching) achieving 80.0\% accuracy and a Matthews Correlation Coefficient (MCC) of 0.589. This modular LLM-OIE plus LLM-matching approach generally outperformed holistic end-to-end LLM methods and simpler rule-based or string-matching techniques, highlighting the value of structured intermediate representations.This research validates the utility of LLMs for OIE and semantic comparison in a specialized, non-English domain. The developed open-source, modular framework serves as a practical tool and contributes to understanding LLM capabilities and limitations in structured knowledge extraction, offering a foundation for advanced automated assessment and information retrieval systems.
en
dc.language
English
-
dc.language.iso
en
-
dc.rights.uri
http://rightsstatements.org/vocab/InC/1.0/
-
dc.subject
Open Information Extraction
en
dc.subject
Semantic Triplet Matching
en
dc.subject
Large Language Models
en
dc.subject
Knowledge Extraction
en
dc.subject
Legal Texts
en
dc.subject
Automated Assessment
en
dc.subject
Text Comparison
en
dc.subject
Information Retrieval
en
dc.subject
Natural Language Processing
en
dc.subject
Evaluation Framework
en
dc.title
Large Language Model-based framework for Open Information Extraction
en
dc.type
Thesis
en
dc.type
Hochschulschrift
de
dc.rights.license
In Copyright
en
dc.rights.license
Urheberrechtsschutz
de
dc.identifier.doi
10.34726/hss.2025.131626
-
dc.contributor.affiliation
TU Wien, Österreich
-
dc.rights.holder
Tamas Robert Csakvari
-
dc.publisher.place
Wien
-
tuw.version
vor
-
tuw.thesisinformation
Technische Universität Wien
-
tuw.publication.orgunit
E194 - Institut für Information Systems Engineering