Querying knowledge graphs at web scale

Azzam, Amr

doi:10.34726/hss.2023.117440

Datensatz Zitierlink:

https://doi.org/10.34726/hss.2023.117440
http://hdl.handle.net/20.500.12708/192871

Titel:

Zitat:

Azzam, A. (2023). Querying knowledge graphs at web scale [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2023.117440

reposiTUm-DOI:

10.34726/hss.2023.117440

CatalogPlus:

AC17053178

Publikationstyp:

Hochschulschrift - Dissertation

Sprache:

Englisch

Autor_innen:

Azzam, Amr

Betreuer_in:

Polleres, Axel

Organisationseinheit:

E192 - Institut für Logic and Computation

Datum (veröffentlicht):

2023

Umfang:

227

Keywords:

Linked Data; Querying; Availability; Scalability; SPARQL; Linked Data Fragments; Query engines; Decentralized query processing

Abstract:

While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. To this end, Linked Data Fragments (LDF) have introduced a foundational framework that has sparked research exploring a spectrum of potential Web querying interfaces between server-side query processing via SPARQL endpoints and client-side query processing of data dumps. Current proposals in between typically suffer from an imbalanced load on either the client or the server. In this thesis, we present a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining server-side query processing with shipping compressed KG partitions. Next, we present the first work that combines both client-side and server-side query optimization techniques in a truly dynamic fashion by employing a cost model that dynamically delegates the load between servers and clients by combining client-side processing of shipped partitions with efficient server- side processing of star-shaped sub-queries, based on current server workload and client capabilities. Thereafter, we investigate alternative interfaces able to ship partitions of KGs from the server to the client, aiming to reduce server-resource consumption. To this end, we align formal definitions and notations of the original LDF framework to uniformly present partition-based LDF approaches. Our thesis is a step forward towards a better- balanced share of the query processing load between clients and servers by shipping graph partitions driven by the structure of RDF graphs to group entities described with the same sets of properties and classes. Throughout the thesis, we empirically evaluate our approach against real-world and synthetic RDF KGs on both pre-existing benchmarks for highly concurrent query execution as well as a novel query workload benchmark inspired by query logs of existing SPARQL endpoints. Our experiments show that our proposed work significantly outperforms state-of-the-art solutions in terms of average total query execution time per client, while at the same time decreasing network traffic and increasing server-side availability and outperforms state-of-the-art solutions and increasing server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs.

Weitere Information:

Literaturverzeichis: Seite 203-227

Lizenz:

Urheberrechtsschutz

Enthalten in den Sammlungen:

Thesis