Performance and Scalability Analysis of Dask Applications on Large Scale Systems

Chakarov, Teodor

doi:10.34726/hss.2026.131836

Record link:

https://doi.org/10.34726/hss.2026.131836
http://hdl.handle.net/20.500.12708/226951

Title:

Performance and Scalability Analysis of Dask Applications on Large Scale Systems

Citation:

Chakarov, T. (2026). Performance and Scalability Analysis of Dask Applications on Large Scale Systems [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.131836

reposiTUm DOI:

10.34726/hss.2026.131836

CatalogPlus:

AC17802688

Publication Type:

Thesis - Diplomarbeit

Language:

English

Authors:

Chakarov, Teodor

Advisor:

Hunold, Sascha

Organisational Unit:

E191 - Institut für Computer Engineering

Date (published):

2026

Number of Pages:

122

Keywords:

Dask; Distributed Computing; Scalability Analysis; High-Performance Computing; Kubernetes; SLURM; Adaptive Autoscaling; Weak and Strong Scaling; Analytical Workloads; DuckDB

Abstract:

This thesis evaluates the scalability and performance characteristics of the Dask framework for large-scale analytical workloads across two contrasting computing environments: a local Kubernetes-based deployment and a high-performance computing (HPC) cluster managed by SLURM. Although Dask is increasingly used as a flexible alternative to frameworks such as Apache Spark, its behavior under compute-intensive workloads remains poorly known. To address this gap, we conduct extensive strong-scaling and weak-scaling experiments using different-scale (TPC-H–like) datasets, analyze Dask’s adaptive autoscaling mechanism, and integrate DuckDB as an embedded execution engine within Dask workers.The results show that the local Kubernetes deployment achieves limited scalability due to shared-resource contention, with performance saturating after only a few workers. In contrast, the HPC system maintains high parallel efficiency across many nodes, particularly when running one worker per node. This demonstrates that distributed memory bandwidth and reduced spill-to-disk activity are essential for scaling shuffle-intensive queries. Weak scaling on the HPC system is more stable than on the local cluster, although absolute runtimes are higher because distributed execution introduces significant network and I/O overheads.Adaptive scaling performs poorly in both environments. On Kubernetes, the autoscaler behaves unpredictably, frequently terminating workers too aggressively and destabilizing long-running queries. On the HPC system, adaptive scaling consistently underperforms fixed-size clusters because the ramp-up phase forces memory-bound workloads to execute with insufficient resources.Finally, experiments combining Dask with DuckDB reveal substantial performance improvements. Distributed DuckDB consistently outperforms Dask DataFrame execution—often by a factor of three to four—due to more efficient local processing, reduced I/O per worker, and optimized query execution. Even single-worker DuckDB is faster than Dask-only execution, emphasizing the importance of leveraging specialized query engines within distributed data-processing frameworks.Overall, this thesis provides a detailed empirical analysis of Dask’s behavior under large-scale analytical workloads and offers practical recommendations for configuring Dask on HPC systems, understanding its scaling limits, and integrating complementary technologies such as DuckDB to improve performance.

Additional information:

Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft
Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers

License:

In Copyright

Appears in Collections:

Thesis