<div class="csl-bib-body">
<div class="csl-entry">Salimi Beni, M., Laso, R., Cosenza, B., Benkner, S., & Hunold, S. (2025). Exploring NCCL Tuning Strategies for Distributed Deep Learning. In <i>2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)</i> (pp. 59–62). IEEE. https://doi.org/10.1109/IPDPSW66978.2025.00015</div>
</div>
-
dc.identifier.uri
http://hdl.handle.net/20.500.12708/222800
-
dc.description.abstract
The communication overhead in distributed deep learning caused by the synchronization of model parameters across multiple devices can significantly impact training time. Although powerful GPU-GPU communication libraries, such as NCCL, are available, their default configurations have not been effectively adapted to varying hardware and workloads, which can result in lower performance.In this paper, we explore the tuning potential of NCCL and present an approach to tuning its parameters for distributed deep learning workloads. We identify efficient parameter configurations through an optimization process that explores the solution space defined by performance-related NCCL parameters. Experimental results on the Leonardo supercomputer, utilizing up to 64 GPUs, show significant performance improvements across micro-benchmarks and three deep learning models. For ncclAllReduce and ncclAllGather, we improved the bandwidth by factors of 112× and 36× in micro-benchmarks, respectively. The tuned NCCL parameter configurations reduced the training time of the models by up to 12.5%.
en
dc.description.sponsorship
FWF - Österr. Wissenschaftsfonds
-
dc.language.iso
en
-
dc.relation.ispartofseries
IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)
-
dc.subject
Collective Communications
en
dc.subject
Deep Learning
en
dc.subject
Multi-GPU
en
dc.subject
NCCL
en
dc.subject
Parameter Tuning
en
dc.title
Exploring NCCL Tuning Strategies for Distributed Deep Learning
en
dc.type
Inproceedings
en
dc.type
Konferenzbeitrag
de
dc.contributor.affiliation
University of Salerno, Italy
-
dc.contributor.affiliation
University of Vienna, Austria
-
dc.relation.isbn
979-8-3315-2643-6
-
dc.relation.doi
10.1109/IPDPSW66978.2025
-
dc.relation.issn
2639-3867
-
dc.description.startpage
59
-
dc.description.endpage
62
-
dc.relation.grantno
P 33884-N
-
dc.type.category
Full-Paper Contribution
-
dc.relation.eissn
2995-066X
-
tuw.booktitle
2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
-
tuw.peerreviewed
true
-
tuw.relation.publisher
IEEE
-
tuw.project.title
Offline- und Online-Autotuning von Parallelen Programmen
-
tuw.researchTopic.id
C4
-
tuw.researchTopic.id
C5
-
tuw.researchTopic.id
C3
-
tuw.researchTopic.name
Mathematical and Algorithmic Foundations
-
tuw.researchTopic.name
Computer Science Foundations
-
tuw.researchTopic.name
Computational System Design
-
tuw.researchTopic.value
25
-
tuw.researchTopic.value
25
-
tuw.researchTopic.value
50
-
tuw.publication.orgunit
E191-04 - Forschungsbereich Parallel Computing
-
tuw.publisher.doi
10.1109/IPDPSW66978.2025.00015
-
dc.description.numberOfPages
4
-
tuw.author.orcid
0000-0002-8634-7712
-
tuw.author.orcid
0000-0003-2574-4025
-
tuw.author.orcid
0000-0002-8869-6705
-
tuw.author.orcid
0000-0002-6520-2047
-
tuw.author.orcid
0000-0002-5280-3855
-
tuw.event.name
2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)