Träff, J. L. (2025). Communication Round and Computation Efficient Exclusive Prefix-Sums Algorithms (for MPI_Exscan). arXiv. https://doi.org/10.34726/10821
Parallel scan primitives compute element-wise inclusive or exclusive prefix sums of input vectors contributed by p consecutively ranked processors under an associative, binary operator ⊕. In message-passing systems with bounded, one-ported communication capabilities, at least ⌈log2 p⌉ or ⌈log2(p − 1)⌉ communication rounds are required to perform the scans. While there are well-known, simple algorithms for the inclusive scan that solve the problem in ⌈log2 p⌉ communication rounds with ⌈log2 p⌉ applications of ⊕ (which could be expensive), the exclusive scan appears more difficult. Conventionally, the problem is solved with either ⌈log2(p − 1)⌉ + 1 communication rounds (e.g., by shifting the input vectors), or in ⌈log2 p⌉ communication rounds with 2⌈log2 p⌉ − 1 applications of ⊕ (by a modified inclusive scan algorithm). We give a new, simple algorithm that computes the exclusive prefix sums in q = ⌈log2(p − 1) + log2 4 3 ⌉ simultaneous send-receive communication rounds with q − 1 applications of ⊕. We compare the three algorithms implemented in MPI against the MPI library native MPI Exscan primitive on a small, 36-node cluster with a state-of-the-art MPI library, indicating possible and worthwhile improvements to standard implementations. The algorithms assume input vectors to be small so that performance is dominated by the num- ber of communication rounds. For large input vectors, other (pipelined, fixed-degree tree) algorithms must be used.
en
Research Areas:
Logic and Computation: 20% Computer Engineering and Software-Intensive Systems: 50% Computer Science Foundations: 30%