Pölzl, P. (2026). Study of Performance and Portability of a Scientific Code on Long Vector Architectures [Diploma Thesis, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2026.130548
High Performance Computing; RISCV; Scientific Computing
en
Abstract:
Long-vector architectures are currently experiencing a resurgence in high performance computing(HPC) as they promise massive parallelism and portable code when combined with auto-vectorization.The European processor accelerators (EPAC) prototype, developed by the European Processor Initiative(EPI), combines this principle with the open-source RISC-V "V" (RVV) extension. While the hardware isactively developed, most HPC software targets heterogeneous host-device platforms and is poorly pre-pared for long-vector hardware. Current literature specifically lacks full-scale optimization studies ofapplications dominated by spherical harmonic transforms or similar spectral methods. Therefore, thevectorization challenges and cross-platform performance portability of these workloads remain unclear.To address this gap, this thesis formalizes a reusable workflow based on the software development vehi-cle (SDV) methodology to systematically optimize the spherical harmonic transforms inside XSHELLSfor the EPAC prototype, explicitly comparing the trade-offs between compiler auto-vectorization andarchitecture-specific vector intrinsics. Performance evaluations reveal that refactoring code to assistauto-vectorization yields overall speedups of up to 1.91×. Explicit vectorization overcomes severecompiler limitations in nested loops and delivers overall gains of up to 2.49×. Although these codeadaptations translate effectively to the NEC SX-Aurora (yielding gains up to 20.59× over the scalarbaseline), they produce significant loop overhead, degrading performance to 0.63× and 0.87× of theauto-vectorized baseline on the Intel Sapphire Rapids and NVIDIA Grace CPUs, respectively. Ulti-mately, this research produces an SDV-based optimization blueprint for the EPAC platform and iden-tifies three generalizable code patterns critical for vectorization efficiency. The results demonstratethat maximizing hardware utilization on current long-vector architectures requires manual intrinsicvectorization, as compiler support for nested multi-dimensional loops remains a critical bottleneck.
en
Additional information:
Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüft Abweichender Titel nach Übersetzung der Verfasserin/des Verfassers