OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning

Hunold, Sascha; Steiner, Sebastian

doi:10.1109/PMBS56514.2022.00016

DC Field

Value

Language

dc.contributor.author

Hunold, Sascha

dc.contributor.author

Steiner, Sebastian

dc.date.accessioned

2023-08-31T10:10:48Z

dc.date.available

2023-08-31T10:10:48Z

dc.date.issued

2023

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Hunold, S., & Steiner, S. (2023). OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning. In <i>Proceedings of PMBS 2022: performance modeling, benchmarking and simulation of high performance computer systems</i> (pp. 123–128). IEEE. https://doi.org/10.1109/PMBS56514.2022.00016</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/188027

dc.description.abstract

Collective communication operations, such as Broadcast or Reduce, are fundamental cornerstones in many high-performance applications. Most collective operations can be implemented using different algorithms, each of which has advantages and disadvantages. For that reason, MPI libraries typically implement a selection logic that attempts to make good algorithmic choices for specific problem instances. It has been shown in the literature that the hard-coded algorithm selection logic found in MPI libraries can be improved by tuning the collectives in a separate, offline micro-benchmarking run.In the present paper, we go a fundamentally different way of improving the algorithm selection for MPI collectives. We integrate the probing of different algorithms directly into the MPI library. Whenever an MPI application is started with a given process configuration, i.e., the number of nodes and the processes per node, the tuner, instead of the default selection logic, finds the next algorithm to complete an issued MPI collective call. The tuner records the runtime of this MPI call for a subset of processes. With the recorded performance data, the tuner is able to build a performance model that allows selecting an efficient algorithm for a given collective problem. Subsequently recorded performance results are then used to update the performance model, where the probabilities for selecting an algorithm are adapted by the tuner, such that slow algorithms get a smaller chance of being selected. We show in a case study, using the ECP proxy application miniAMR, that our approach can effectively tune the performance of Allreduce.

dc.description.sponsorship

FWF Fonds zur Förderung der wissenschaftlichen Forschung (FWF)

dc.language.iso

dc.subject

algorithm selection

dc.subject

autotuning

dc.subject

collective communication

dc.subject

HPC

dc.subject

machine learning

dc.subject

MPI

dc.title

OMPICollTune: Autotuning MPI Collectives by Incremental Online Learning

dc.type

Inproceedings

dc.type

Konferenzbeitrag

dc.relation.isbn

978-1-6654-5185-7

dc.relation.doi

10.1109/PMBS56514.2022

dc.description.startpage

123

dc.description.endpage

128

dc.relation.grantno

P33884-N

dcterms.dateSubmitted

2023-01-30

dc.type.category

Full-Paper Contribution

tuw.booktitle

Proceedings of PMBS 2022: performance modeling, benchmarking and simulation of high performance computer systems

tuw.peerreviewed

true

tuw.relation.publisher

IEEE

tuw.project.title

Offline- und Online-Autotuning von Parallelen Programmen

tuw.researchTopic.id

tuw.researchTopic.name

Computer Engineering and Software-Intensive Systems

tuw.researchTopic.name

Computer Science Foundations

tuw.researchTopic.value

tuw.publication.orgunit

E191-04 - Forschungsbereich Parallel Computing

tuw.publication.orgunit

E191 - Institut für Computer Engineering

tuw.publisher.doi

10.1109/PMBS56514.2022.00016

dc.description.numberOfPages

tuw.event.name

13th IEEE International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS 2022)

tuw.event.startdate

13-11-2022

tuw.event.enddate

18-11-2022

tuw.event.online

Hybrid

tuw.event.type

Event for scientific audience

tuw.event.place

Dallas

tuw.event.country

tuw.event.presenter

Hunold, Sascha

wb.sciencebranch

Informatik

wb.sciencebranch.oefos

1020

wb.sciencebranch.value

100

item.languageiso639-1

item.openairetype

conference paper

item.grantfulltext

none

item.fulltext

no Fulltext

item.cerifentitytype

Publications

item.openairecristype

http://purl.org/coar/resource_type/c_5794

crisitem.author.dept

E191-04 - Forschungsbereich Parallel Computing

crisitem.author.dept

E191-04 - Forschungsbereich Parallel Computing

crisitem.author.orcid

0000-0002-5280-3855

crisitem.author.parentorg

E191 - Institut für Computer Engineering

crisitem.author.parentorg

E191 - Institut für Computer Engineering

crisitem.project.funder

FWF - Österr. Wissenschaftsfonds

crisitem.project.grantno

P 33884-N

Appears in Collections:

Conference Paper

Show simple item record

Page view(s)

210

checked on Nov 21, 2023

Download(s)

checked on Nov 21, 2023

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM