North, S. (2025). Analysis of the GPU Acceleration Potential of the FFT-Based Pressure Solver in the PALM-4U Model System [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.133062
Due to climate change, the frequency and severity of extreme weather events is increasing, which endangers human livelihoods and key infrastructures. Decision support tools can guide the development of climate-resilient cities by providing information on the potential effectiveness of specific measures during the planning process. In urban environments,decision support tools that incorporate accurate micro-climate models are particularly effective. PALM-4U, a state-of-the-art, scientifically validated microclimate model, could offer this functionality, however it remains largely inaccessible outside the scientific community as it is optimised to run on HPC clusters. However, with the rise of high-performance GPUs, a shift towards single workstations is possible.This study investigates the potential for performance increase of the PALM-4U’s pressuresolver, by utilising the GPU’s acceleration potential in combination with a change intarget architecture. Performance increase is measured using three parameters: speed up,validity (via NMSE, R, and FB), and memory efficiency. Also the effect on the runtime of the full simulation is measured and possible bottlenecks identified. Finally, the fullmodel is analysed to assess the overall feasibility of GPU optimisation, providing insights to guide future development.The pressure solver transforms the 3D Poisson equation using Fast Fourier Transform and solves the resulting 1D system via the Thomas algorithm. The code structure is optimised, CUDA-optimised kernels are implemented and the cuFFT library is integrated.In addition a mixed-precision approach is tested to evaluate its impact on performance and accuracy.The single core GPU implementation achieves a speed up of up to 65.5 times in single precision and up to 49.3 times for double precision for large domain sizes. The stability of the system remains unaffected by the mixed-precision approach, and no significant variation is observed between FP32 and FP64 runs. After 45 × 103 simulation steps, NMSE (0.02), FB (-0.017) and R (0.96), demonstrate a stable and accurate performance consistent across precisions. Additionally, the memory requirement is reduced up to 68% compared to the baseline CPU solver. The optimisations leads to a runtime reduction ofthe full model by 15%, demonstrating the potential for accessible, scientifically validated microclimate models.
en
Additional information:
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers