Lissandrini, M., Rabbani, K., & Hose, K. (2024). Mining Validating Shapes for Large Knowledge Graphs via Dynamic Reservoir Sampling. In M. Atzori, P. CIACCIA, M. Ceci, F. Mandreoli, D. Malerba, & M. SANGUINETTI (Eds.), Proceedings of the 32nd Symposium on Advanced Database Systems (SEBD 2024) (pp. 25–34). https://doi.org/10.34726/8213
E192-02 - Forschungsbereich Databases and Artificial Intelligence
-
Published in:
Proceedings of the 32nd Symposium on Advanced Database Systems (SEBD 2024)
-
Volume:
3741
-
Date (published):
2024
-
Event name:
Symposium on Advanced Database Systems 2024 (SEBD 2024)
en
Event date:
23-Jun-2024 - 26-Jun-2024
-
Event place:
Villasimius, Italy
-
Number of Pages:
10
-
Peer reviewed:
Yes
-
Keywords:
Knowledge Graphs; Data Mining; Data Quality
en
Abstract:
Knowledge Graphs (KGs) are databases that model knowledge from heterogeneous domains using the graph data model. Shape constraint languages have been adopted in KGs to ensure their data quality.
They encode the equivalent of a schema in the Resource Description Framework (RDF). Unfortunately, few KGs are accompanied by a corresponding set of validating shapes. When validating shapes are missing, ...
Knowledge Graphs (KGs) are databases that model knowledge from heterogeneous domains using the graph data model. Shape constraint languages have been adopted in KGs to ensure their data quality.
They encode the equivalent of a schema in the Resource Description Framework (RDF). Unfortunately, few KGs are accompanied by a corresponding set of validating shapes. When validating shapes are missing, the solution is to extract them from the graph via mining techniques. Current shape extraction methods are often incomplete, not scalable, and generate spurious shapes. Thus, in this discussion paper, we present our recent contribution: a novel Quality Shapes Extraction (QSE) method for large graphs.
QSE computes confidence and support for shape constraints via a novel Dynamic Reservoir Sampling method, enabling the identification of informative and reliable shapes. QSE is the first method (validated on WikiData and DBpedia) to extract a complete set of shapes from large real-world KGs.