Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, All-Broadcast, Reduction and All-Reduction

Träff, Jesper Larsson

doi:10.48550/arXiv.2407.18004

DC Field

Value

Language

dc.contributor.author

Träff, Jesper Larsson

dc.date.accessioned

2025-09-25T08:10:23Z

dc.date.available

2025-09-25T08:10:23Z

dc.date.issued

2024-07-26

dc.identifier.citation

<div class="csl-bib-body"> <div class="csl-entry">Träff, J. L. (2024). <i>Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, All-Broadcast, Reduction and All-Reduction</i>. arXiv. https://doi.org/10.34726/10820</div> </div>

dc.identifier.uri

http://hdl.handle.net/20.500.12708/219400

dc.identifier.uri

https://doi.org/10.34726/10820

dc.description.abstract

We give optimally fast O(log p) time (per processor) algorithms for computing round- optimal broadcast schedules for message-passing parallel computing systems. This affirmatively answers difficult questions posed in a SPAA 2022 BA and a CLUSTER 2022 paper. We observe that the computed schedules and circulant communication graph can likewise be used for reduction, all-broadcast and all-reduction as well, leading to new, round-optimal algorithms for these problems. These observations affirmatively answer open questions posed in a CLUSTER 2023 paper. The problem is to broadcast n indivisible blocks of data from a given root processor to all other processors in a (subgraph of a) fully connected network of p processors with fully bidirectional, one-ported communication capabilities. In this model, n − 1 + ⌈log2 p⌉ communication rounds are required. Our new algorithms compute for each processor in the network receive and send schedules each of size ⌈log2 p⌉ that determine uniquely in O(1) time for each communication round the new block that the processor will receive, and the already received block it has to send. Schedule computations are done independently per processor without communication. The broadcast communication subgraph is an easily computable, directed, ⌈log2 p⌉-regular circulant graph also used elsewhere. We show how the schedule computations can be done in optimal time and space of O(log p), improving significantly over previous results of O(p log2 p) and O(log3 p), respectively. The schedule computation and broadcast algorithms are simple to implement, but correctness and complexity are not obvious. The schedules are used for new implementations of the MPI (Message-Passing Interface) collectives MPI Bcast, MPI Allgatherv,MPI Reduce and MPI Reduce scatter. Pre- liminary experimental results are given. Carefully engineered and extensively evaluated implementations will be presented elsewhere.

dc.language.iso

dc.rights.uri

https://creativecommons.org/licenses/by/4.0/

dc.subject

MPI

dc.subject

MPI (Message-Passing Interface)

dc.subject

Distributed Computing

dc.subject

Parallel Computing

dc.subject

Cluster Computing

dc.title

Optimal Broadcast Schedules in Logarithmic Time with Applications to Broadcast, All-Broadcast, Reduction and All-Reduction

dc.type

Preprint

dc.type

Preprint

dc.rights.license

Creative Commons Attribution 4.0 International

dc.rights.license

Creative Commons Namensnennung 4.0 International

dc.identifier.doi

10.34726/10820

tuw.researchTopic.id

tuw.researchTopic.name

Logic and Computation

tuw.researchTopic.name

Computer Engineering and Software-Intensive Systems

tuw.researchTopic.name

Computer Science Foundations

tuw.researchTopic.value

tuw.publication.orgunit

E191-04 - Forschungsbereich Parallel Computing

tuw.publisher.doi

10.48550/arXiv.2407.18004

dc.identifier.libraryid

AC17647030

tuw.author.orcid

0000-0002-4864-9226

dc.rights.identifier

CC BY 4.0

dc.rights.identifier

CC BY 4.0

tuw.publisher.server

arXiv

wb.sciencebranch

Informatik

wb.sciencebranch.oefos

1020

wb.sciencebranch.value

100

item.openaccessfulltext

Open Access

item.openairecristype

http://purl.org/coar/resource_type/c_816b

item.mimetype

application/pdf

item.fulltext

with Fulltext

item.cerifentitytype

Publications

item.grantfulltext

open

item.openairetype

preprint

item.languageiso639-1

crisitem.author.dept

E191-04 - Forschungsbereich Parallel Computing

crisitem.author.orcid

0000-0002-4864-9226

crisitem.author.parentorg

E191 - Institut für Computer Engineering

Appears in Collections:

Preprint

Fulltext (Submitted Version)

Adobe PDF

(471.53 kB)

CC BY 4.0

Show simple item record

Page view(s)

checked on Sep 25, 2025

Download(s)

checked on Sep 25, 2025

Google Scholar^TM

Check

Page view(s)

Download(s)

Google ScholarTM

Google Scholar^TM