Exploring Scalability in C++ Parallel STL Implementations

Laso Rodriguez, Ruben; Krupitza, Diego; Hunold, Sascha

doi:10.1145/3673038.3673065

Record link:

http://hdl.handle.net/20.500.12708/204481

Title:

Exploring Scalability in C++ Parallel STL Implementations

Citation:

Laso Rodriguez, R., Krupitza, D., & Hunold, S. (2024). Exploring Scalability in C++ Parallel STL Implementations. In ICPP ’24: Proceedings of the 53rd International Conference on Parallel Processing (pp. 284–293). ACM. https://doi.org/10.1145/3673038.3673065

CatalogPlus:

AC17364268

Publisher DOI:

10.1145/3673038.3673065

Publication Type:

Inproceedings - Full-Paper Contribution

Language:

English

Authors:

Laso Rodriguez, Ruben
Krupitza, Diego
Hunold, Sascha

Organisational Unit:

E191-04 - Forschungsbereich Parallel Computing

Published in:

ICPP '24: Proceedings of the 53rd International Conference on Parallel Processing

ISBN:

9798400717932

Date (published):

2024

Event name:

53rd International Conference on Parallel Processing (ICPP 2024)

Event date:

12-Aug-2024 - 15-Aug-2024

Event place:

Gotland, Sweden

Number of Pages:

Publisher:

ACM, New York, NY, United States

Peer reviewed:

Yes

Keywords:

C++; CUDA; OpenMP; Performance Portability; Standard Template Library; Threading Building Blocks

Abstract:

Since the advent of parallel algorithms in the C++17 Standard Template Library (STL), the STL has become a viable framework for creating performance-portable applications. Given multiple existing implementations of the parallel algorithms, a systematic, quantitative performance comparison is essential for choosing the appropriate implementation for a particular hardware configuration. In this work, we introduce a specialized set of micro-benchmarks to assess the scalability of the parallel algorithms in the STL. By selecting different backends, our micro-benchmarks can be used on multi-core systems and GPUs. Using the suite, in a case study on AMD and Intel CPUs and NVIDIA GPUs, we were able to identify substantial performance disparities among different implementations, including GCC+TBB, GCC+HPX, Intel's compiler with TBB, or NVIDIA's compiler with OpenMP and CUDA.

Project title:

Offline- und Online-Autotuning von Parallelen Programmen: P 33884-N (FWF - Österr. Wissenschaftsfonds)

Research Areas:

Computer Engineering and Software-Intensive Systems: 90%
Computer Science Foundations: 10%

Science Branch:

1020 - Informatik: 100%

License:

CC BY-NC-SA 4.0