Large Language Model-based framework for Open Information Extraction

Csakvari, Tamas Robert

doi:10.34726/hss.2025.131626

DC Field

Value

Language

dc.contributor.advisor

Recski, Gábor

dc.contributor.author

Csakvari, Tamas Robert

dc.date.accessioned

2025-10-22T11:53:20Z

dc.date.issued

2025

dc.date.submitted

2025-10

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Csakvari, T. R. (2025). <i>Large Language Model-based framework for Open Information Extraction</i> [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.131626</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2025.131626

dc.identifier.uri

http://hdl.handle.net/20.500.12708/220264

dc.description

Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft

dc.description

Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

dc.description.abstract

The growth of unstructured digital text demands effective knowledge extraction methods. While traditional Information Extraction is limited by rigid schemas, Open Information Extraction (OIE) provides needed flexibility. Large Language Models (LLMs) show promise for OIE but their application to both OIE and semantic triplet matching remains underexplored.This thesis introduces and evaluates a novel, modular LLM-based framework designed for OIE, subsequent semantic triplet matching, and text comparison, with validation performed on a German legal education dataset of student responses. The framework employs LLMs to first extract (subject, relation, object) triplets from the German legal texts. These extracted candidate triplets are then semantically compared against predefined target triplets (representing key legal contents) using an LLM-based triplet matching process. The system's performance was quantitatively and qualitatively evaluated on the dataset of student answers to a specific legal case, comparing LLM-based triplet matching outputs against human-annotated ground truth. Several state-of-the-art LLMs (including GPT-4 series, Llama, DeepSeek) were benchmarked, alongside alternative methods such as end-to-end LLM evaluation, rule-based OIE, and string-based triplet matching for comparison.Results demonstrate the framework's considerable proficiency, with the top-performing configuration (GPT-4.1-mini for both OIE and triplet matching) achieving 80.0\% accuracy and a Matthews Correlation Coefficient (MCC) of 0.589. This modular LLM-OIE plus LLM-matching approach generally outperformed holistic end-to-end LLM methods and simpler rule-based or string-matching techniques, highlighting the value of structured intermediate representations.This research validates the utility of LLMs for OIE and semantic comparison in a specialized, non-English domain. The developed open-source, modular framework serves as a practical tool and contributes to understanding LLM capabilities and limitations in structured knowledge extraction, offering a foundation for advanced automated assessment and information retrieval systems.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

Open Information Extraction

dc.subject

Semantic Triplet Matching

dc.subject

Large Language Models

dc.subject

Knowledge Extraction

dc.subject

Legal Texts

dc.subject

Automated Assessment

dc.subject

Text Comparison

dc.subject

Information Retrieval

dc.subject

Natural Language Processing

dc.subject

Evaluation Framework

dc.title

Large Language Model-based framework for Open Information Extraction

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2025.131626

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Tamas Robert Csakvari

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E194 - Institut für Information Systems Engineering

dc.type.qualificationlevel

Diploma

dc.identifier.libraryid

AC17678207

dc.description.numberOfPages

dc.thesistype

Diplomarbeit

dc.thesistype

Diploma Thesis

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

tuw.advisor.orcid

0000-0001-5551-3100

item.fulltext

with Fulltext

item.openaccessfulltext

Open Access

item.languageiso639-1

item.openairecristype

http://purl.org/coar/resource_type/c_bdcc

item.cerifentitytype

Publications

item.grantfulltext

open

item.openairetype

master thesis

item.mimetype

application/pdf

crisitem.author.dept

E166-04-1 - Forschungsgruppe Bioprozess-Technologie

crisitem.author.parentorg

E166-04 - Forschungsbereich Bioverfahrenstechnik

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(1.05 MB)

In Copyright

Show simple item record

Page view(s)

checked on Oct 22, 2025

Download(s)

checked on Oct 22, 2025

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM