Vardas, I., Hunold, S., Ajanohoun, J. I., & Traff, J. L. (2022). mpisee: MPI Profiling for Communication and Communicator Structure. In 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2022) (pp. 520–529). IEEE. https://doi.org/10.1109/IPDPSW55747.2022.00092
27th Workshop on High-level Parallel Programming Models and Supportive Environments (HIPS 2022) in conjunction with IEEE IPDPS 2022
en
Event date:
30-May-2022 - 3-Jun-2022
-
Event place:
Lyon, France
-
Number of Pages:
10
-
Publisher:
IEEE
-
Peer reviewed:
Yes
-
Keywords:
MPI Profiling
en
Abstract:
Cumulative performance profiling is a fast and lightweight method for gaining summary information about where and how communication time in parallel MPI applications is spent. MPI provides mechanisms for implementing such profilers that can be transparently used with applications. Existing profilers typically profile on a process basis and record the frequency, total time, and volume of MPI operations per process. This can lead to grossly misleading cumulative information for applications that make use of MPI features for partitioning the processes into different communicators. We present a novel MPI profiler, mpisee, for communicator-centric profiling that separates and records collective and point-to-point communication information per communicator in the application. We discuss the implementation of mpisee which makes significant use of the MPI attribute mechanism. We evaluate our tool by measuring its overhead and profiling a number of standard applications. Our measurements with thirteen MPI applications show that the overhead of mpisee is less than 3%. Moreover, using mpisee, we investigate in detail two particular MPI applications, SPLATT and GROMACS, to obtain information on the various MPI operations for the different communicators of these applications. Such information is not available by other, state-of-the-art profilers. We use the communicator-centric information to improve the performance of SPLATT resulting in a significant runtime decrease when run with 1024 processes.
en
Project title:
Algorithm Engineering für Prozess Mapping: P31763-N31 (Fonds zur Förderung der wissenschaftlichen Forschung (FWF)) Offline- und Online-Autotuning von Parallelen Programmen: P33884-N (Fonds zur Förderung der wissenschaftlichen Forschung (FWF))
-
Research Areas:
Computer Engineering and Software-Intensive Systems: 90% Computer Science Foundations: 10%