Chakarov, T. (2026). Performance and Scalability Analysis of Dask Applications on Large Scale Systems [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.131836
This thesis evaluates the scalability and performance characteristics of the Dask framework for large-scale analytical workloads across two contrasting computing environments: a local Kubernetes-based deployment and a high-performance computing (HPC) cluster managed by SLURM. Although Dask is increasingly used as a flexible alternative to frameworks such as Apache Spark, its behavior under compute-intensive workloads remains poorly known. To address this gap, we conduct extensive strong-scaling and weak-scaling experiments using different-scale (TPC-H–like) datasets, analyze Dask’s adaptive autoscaling mechanism, and integrate DuckDB as an embedded execution engine within Dask workers.The results show that the local Kubernetes deployment achieves limited scalability due to shared-resource contention, with performance saturating after only a few workers. In contrast, the HPC system maintains high parallel efficiency across many nodes, particularly when running one worker per node. This demonstrates that distributed memory bandwidth and reduced spill-to-disk activity are essential for scaling shuffle-intensive queries. Weak scaling on the HPC system is more stable than on the local cluster, although absolute runtimes are higher because distributed execution introduces significant network and I/O overheads.Adaptive scaling performs poorly in both environments. On Kubernetes, the autoscaler behaves unpredictably, frequently terminating workers too aggressively and destabilizing long-running queries. On the HPC system, adaptive scaling consistently underperforms fixed-size clusters because the ramp-up phase forces memory-bound workloads to execute with insufficient resources.Finally, experiments combining Dask with DuckDB reveal substantial performance improvements. Distributed DuckDB consistently outperforms Dask DataFrame execution—often by a factor of three to four—due to more efficient local processing, reduced I/O per worker, and optimized query execution. Even single-worker DuckDB is faster than Dask-only execution, emphasizing the importance of leveraging specialized query engines within distributed data-processing frameworks.Overall, this thesis provides a detailed empirical analysis of Dask’s behavior under large-scale analytical workloads and offers practical recommendations for configuring Dask on HPC systems, understanding its scaling limits, and integrating complementary technologies such as DuckDB to improve performance.
en
Additional information:
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers