The Knowledge Graph Divide - connecting machine learning, databases, and the semantic web

Pavlović, Aleksandar

doi:10.34726/hss.2025.128683

DC Field

Value

Language

dc.contributor.advisor

Sallinger, Emanuel

dc.contributor.author

Pavlović, Aleksandar

dc.date.accessioned

2025-04-17T11:09:09Z

dc.date.issued

2025

dc.date.submitted

2025-01

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Pavlović, A. (2025). <i>The Knowledge Graph Divide - connecting machine learning, databases, and the semantic web</i> [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.128683</div> </div>

dc.identifier.uri

https://doi.org/10.34726/hss.2025.128683

dc.identifier.uri

http://hdl.handle.net/20.500.12708/214156

dc.description.abstract

Over the past decade, Knowledge Graphs (KGs) have received enormous interest from industry and academia. However, there are three key research communities, namely the Machine Learning (ML), Database (DB), and Semantic Web (SW) communities, studying KGs with major gaps between them. This dissertation is about bridging their divisions: Reasoning Divide. KGs are inherently incomplete. Therefore, the ML community has proposed Knowledge Graph Embedding Models (KGEs), achieving promising results for predicting missing links. Key data properties in the DB and SW fields are typically represented via logical rules. However, any current KGE cannot capture vital rules, i.e., infer missing links while adhering to such rules. Capturing (i) general composition and (ii) composition and hierarchy rules jointly are crucial open problems. To bridge this division, we introduce the ExpressivE model that embeds pairs of entities as points and relations as hyper-parallelograms in the virtual triple space R2d. This model design allows ExpressivE to capture a rich set of logical rules while offering an intuitive and consistent geometric interpretation of ExpressivE embeddings and their captured rules. Scalability Divide. Even more, the SW and DB communities provide massive KGs, calling for efficient KGEs. However, most contemporary ML-based KGEs require high-dimensional embeddings or complex embedding spaces for competitive prediction results, drastically raising their space and time requirements. Thus, developing efficient KGEs makes up another central open problem dividing the SW, DB, and ML fields. Facing this challenge, we propose SpeedE, a Euclidean KGE that (i) has strong inference capabilities,(ii) is competitive with state-of-the-art KGEs, significantly outperforming them on the YAGO3-10 and WN18RR benchmarks, and (iii) dramatically increases their efficiency, needing on WN18RR solely a fifth of the training time and a fourth of the parameters of the best-performing model (ExpressivE) to reach the same link prediction performance. Data Management Divide. Above all, the DB and SW communities have driven classical KG research. However, there remains a divide between approaches from these two fields. For instance, while languages such as SQL or Datalog are widely used in the DB area, a vastly different set of languages, such as SPARQL and OWL, is used in the SW area. This mismatch, however, makes blending KGs from both communities a complex endeavor, rendering the interoperability between DB and SW technologies a pressing open challenge. Thus, we present the SparqLog system, a uniform and consistent KG management framework meeting essential requirements from the SW and DB fields.

dc.language

English

dc.language.iso

dc.rights.uri

http://rightsstatements.org/vocab/InC/1.0/

dc.subject

machine learning

dc.subject

artificial intelligence

dc.subject

knowledge graphs

dc.subject

graph embeddings

dc.subject

databases

dc.subject

semantic web

dc.subject

query answering

dc.subject

efficiency

dc.subject

scalability

dc.subject

geometric interpretation

dc.title

The Knowledge Graph Divide - connecting machine learning, databases, and the semantic web

dc.title.alternative

Der Knowledge Graph Divide - Überwinden der Hürden zwischen Machine Learning, Databases und dem Semantic Web

dc.type

Thesis

dc.type

Hochschulschrift

dc.rights.license

In Copyright

dc.rights.license

Urheberrechtsschutz

dc.identifier.doi

10.34726/hss.2025.128683

dc.contributor.affiliation

TU Wien, Österreich

dc.rights.holder

Aleksandar Pavlović

dc.publisher.place

Wien

tuw.version

vor

tuw.thesisinformation

Technische Universität Wien

tuw.publication.orgunit

E192 - Institut für Logic and Computation

dc.type.qualificationlevel

Doctoral

dc.identifier.libraryid

AC17493329

dc.description.numberOfPages

189

dc.thesistype

Dissertation

dc.thesistype

Dissertation

tuw.author.orcid

0000-0001-6887-9515

dc.rights.identifier

In Copyright

dc.rights.identifier

Urheberrechtsschutz

tuw.advisor.staffStatus

staff

item.openairecristype

http://purl.org/coar/resource_type/c_db06

item.fulltext

with Fulltext

item.openaccessfulltext

Open Access

item.mimetype

application/pdf

item.languageiso639-1

item.grantfulltext

open

item.openairetype

doctoral thesis

item.cerifentitytype

Publications

crisitem.author.dept

E192-02 - Forschungsbereich Databases and Artificial Intelligence

crisitem.author.orcid

0000-0001-6887-9515

crisitem.author.parentorg

E192 - Institut für Logic and Computation

Appears in Collections:

Thesis

Fulltext (Version of Record (published version))

Adobe PDF

(3.16 MB)

In Copyright

Show simple item record

Page view(s)

218

checked on Apr 17, 2025

Download(s)

274

checked on Apr 17, 2025

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM