Improving colocated MPI application performance via process mapping in HPC systems : leveraging hierarchical process-to-core mappings and communicator-centric profiling

Vardas, Ioannis

doi:10.34726/hss.2025.136022

Record link:

https://doi.org/10.34726/hss.2025.136022
http://hdl.handle.net/20.500.12708/221427

Title:

Improving colocated MPI application performance via process mapping in HPC systems : leveraging hierarchical process-to-core mappings and communicator-centric profiling

Citation:

Vardas, I. (2025). Improving colocated MPI application performance via process mapping in HPC systems : leveraging hierarchical process-to-core mappings and communicator-centric profiling [Dissertation, Technische Universität Wien]. reposiTUm. https://doi.org/10.34726/hss.2025.136022

reposiTUm DOI:

10.34726/hss.2025.136022

CatalogPlus:

AC17714376

Publication Type:

Thesis - Dissertation

Language:

English

Authors:

Vardas, Ioannis

Advisor:

Träff, Jesper Larsson

Organisational Unit:

E191 - Institut für Computer Engineering

Date (published):

2025

Number of Pages:

185

Keywords:

HPC; Parallel Computing; MPI; Profiling; Performance Analysis; Process Mapping; Colocation; Co-Scheduling

Abstract:

With the rapid growth of data-intensive applications in scientific simulations and artificial intelligence, the demand for High-Performance computing (HPC) has increased considerably. Modern HPC systems have evolved into complex architectures, characterized by deep memory hierarchies and numerous computing cores with non-uniform memory access times. While these architectures offer extreme computing power, they present significant challenges for optimizing parallel applications. Resource conflicts over shared elements such as caches or main memory can degrade performance of parallel applications. Additionally, HPC scheduler allocation policies, which are designed to minimize node usage, can inadvertently increase competition for shared resources among homogeneous processes, negatively affecting overall performance. A critical challenge in optimizing parallel applications is the assignment of processes to computing cores to avoid resource conflicts and maximize performance. This dissertation addresses this challenge through two approaches.The first approach investigates the communication structure of MPI applications to identify performance bottlenecks and optimize process mapping. The Message Passing Interface (MPI) remains the de facto standard for programming parallel applications in HPC. MPI communicators enable process grouping for communication, and understanding these communication patterns is essential for performance optimization. We developed the profiling tool mpisee, which, unlike existing tools, provides detailed information about communication per communicator, revealing MPI collective inefficiencies to guide better algorithm selection. The second approach examines how multiple parallel applications can efficiently share a common pool of compute nodes on high-performance systems. We developed mapping strategies paired with colocation to place computing processes from different applications onto the cores within each individual node of the shared allocation in ways that utilize shared resources efficiently. Our results demonstrate that these strategies often improve runtime compared to isolated execution by enhancing the colocated execution of multiple parallel applications.

License:

In Copyright

Appears in Collections:

Thesis