Schwarzinger, T., Thoma, M., Preindl, T., Kjäer, M., Just, V. P., & Steindl, G. (2025). RDF fusion: an extensible SPARQL engine for hybrid data models. IEEE Access, 13, 184297–184311. https://doi.org/10.1109/ACCESS.2025.3623639
Analytics; Hybrid Data Model; Internet of Things; Semantic Web; SPARQL
en
Abstract:
The Internet of Things (IoT) generates vast streams of sensor data that often require enrichment with background knowledge about the system and domain. Although such data can be represented as graphs, purely graph-based models struggle with the temporal aspects of sensor observations, motivating hybrid approaches that integrate graphs with time series data. This creates a need for query engines that can handle both types of data within a single system. In the Semantic Web community, this drives demand for SPARQL engines that are flexible enough to support time series data and efficient for analytical workloads. Existing engines fall short as only some row-based systems focus on extensibility but perform poorly in time series analytics, while columnar systems could offer better analytical performance but lack the necessary extensibility. To address this gap, we present RDF Fusion, an extensible SPARQL engine built on Apache DataFusion, a modular columnar engine optimized for analytical workloads. RDF Fusion uses specialized encodings to represent the dynamic nature of RDF terms within the statically typed data model of DataFusion. These encodings enable efficient SPARQL query execution while preserving the extensibility to experiment with custom operators, optimizations, and hybrid time series support. Our evaluation shows that RDF Fusion complies with SPARQL 1.1 and provides competitive performance in analytical workloads. As an open-source system, it offers a solid foundation for research on hybrid data models in the IoT.
en
Project title:
Datenökosysteme für die Energiewende: 905128 (FFG - Österr. Forschungsförderungs- gesellschaft mbH)