Grasmann, L., Pichler, R., & Selzer, A. (2023). Integration of Skyline Queries into Spark SQL. In F. Geerts & B. Vandevoort (Eds.), Proceedings 26th International Conference on Extending Database Technology (EDBT 2023) (pp. 337–350). OpenProceedings.org. https://doi.org/10.48786/edbt.2023.27
Skyline queries are frequently used in data analytics and multicriteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution by far outperforms a solution based on rewriting into standard SQL.
en
Project title:
Scalable Reasoning in Knowledge Graphs: VRG18-013 (WWTF Wiener Wissenschafts-, Forschu und Technologiefonds) HyperTrac: hypergraph Decompositions and Tractability: P30930-N35 (FWF - Österr. Wissenschaftsfonds)