IEEE Transactions on Parallel and Distributed Systems
Number of Pages:
Hardware and Architecture; Computational Theory and Mathematics; Signal Processing
We study the complexity of finding communication trees with the lowest possible completion time for rooted, irregular gather and scatter collective communication operations in fully connected, $k$k-ported communication networks under a linear-time transmission cost model. Consecutively numbered processors specify data blocks of possibly different sizes to be collected at (gather) or distributed from (scatter) some (given) root processor where they are stored in processor order. We distinguish between ordered and non-ordered communication trees depending on whether segments of blocks are maintained in processor order. We show that lowest completion time, ordered communication trees under one-ported communication can be found in polynomial time by giving simple, but costly dynamic programming algorithms. In contrast, we show that it is an NP-hard problem to construct completion-time optimal, non-ordered communication trees. We have implemented the dynamic programming algorithms for homogeneous networks to evaluate the quality of different types of communication trees, in particular to analyze a recent, distributed, problem-adaptive tree construction algorithm. Model experiments show that this algorithm is close to optimum for a selection of block size and root processor distributions. A concrete implementation for specially structured problems shows that optimal, non-binomial trees can possibly have even further practical advantage.
Computer Engineering and Software-Intensive Systems: 90% Computer Science Foundations: 10%