Mond, F. (2024). A distributed forest of octrees datastructure for particle Wigner simulations [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2024.113934
Wigner simulator; Mesh management; Parallelization; data structures; MPI
en
Abstract:
This thesis explores the performance characteristics and potential bottlenecks of using the p4est library for mesh management in a Monte Carlo particle Wigner simulator. The goal is to develop a benchmark library which encompasses all features needed for a Monte Carlo particle Wigner simulation, while working as a minimal working example of these features to determine where bottlenecks lie. This benchmark library is then run on the VSC5, where the runtime is used as the performance measurement and recorded for every part of the benchmark. The results of this benchmark show that p4est is a strong candidate for a mesh library in the next iteration of the ViennaWD. The benchmark focuses on the generation and deletion of particles, the communication between processors, and the connection between particles and cells in the mesh, which is the most compute intensive part of the benchmark. There are a number of design choices made in the benchmarking library relating to data storage, communication strategy, and particle generation events. A two-dimensional array is used to store all particle data, either as an Array of Structs (AoS), each struct representing a single particle, or a Struct of Arrays (SoA) where each array represents a variable, stored for each particle. Communication is introduced both in a one-shot method as well as a round-based method. Each individual step of the benchmark is then timed to find its performance characteristics with a different number of particles, mesh elements, and processes in the benchmark, expanding to multiple nodes of the VSC cluster for large simulations. Performance scaling with increasing processor count is found to be linear when in a load balanced state. Scaling with increasing particle count is found to be roughly linear, although a communication bottleneck limits the quality of data for this benchmark. Performance scaling with increasing number of mesh cells is found to be extremely strong, decreasing runtime by two orders of magnitude with eight orders of magnitude increases in quadrant count. The limiting factors of performance are found to be in the communication throughput of the VSC and general performance when in a very unbalanced load state, with a specific bottleneck in communication when increasing communication load above a certain threshold. The memory footprint of the benchmarking library in its particle storage is found to be without any significant overhead, while the memory footprint of the quadrant storage managed by p4est is found to scale linearly with the number of quadrants. Some further optimizations, such as compiler flags or different MPI libraries, were not explored in this thesis. P4est is found to be a strong candidate for a mesh management library in the context of a Monte Carlo particle Wigner simulation, supporting the particle-cell relationships that are required for the simulation in a highly scalable way.