@article{ad82892a8a8a4e4a9f042d3418370e7c,
title = "Optimal Broadcast in All-Port Wormhole-Routed Hypercubes",
abstract = "We give an optimal algorithm that broadcasts on an n- dimensional hypercube in Θ(n/ log2 (n +1)) routing steps with wormhole, e-cube routing and all-port communication. Previously, the best algorithm of McKinley and Trefftz requires [n/2] routing steps. We also give routing algorithms that achieve tight time bounds for n ≤ 7.",
author = "Ho, {Ching Tien} and Kao, {Ming Yang}",
note = "Funding Information: I. INTRODUCTION Broadcasting is a fundamental communication primitive used in almost all parallel algorithms. As made clear by several excellent surveys [4], [7], efficient broadcast algorithms are necessary for achieving the high performance required in numerous numerical algebra algorithms [5], [ 121, including matrix-vector multiplication, matrix-matrix multiplication [13], Gaussian elimination [I], [ 161, LU-factorization [6], and Householder transformations [3]. This paper focuses on the problem of broadcasting in hypercubes. Current technology has made it both technically and economically feasible to build hypercubes with thousands of nodes. In fact, several multiprocessors with hypercube or hypercube-like topology are now commercially available. Three leading examples are the CM-2 by Thinking Machines, the iPSCI860 by Intel, and the ,CUBE 2 by XCUBE. This topology is widely accepted in the parallel computing community, because it has a logarithmic diameter and regular structure and offers high communication bandwidths. Moreover, in many applications the hypercube topology allows program structures to be mapped into it with adjacency preserved. While store-and-forward routing was used in the first generation hypercubes, most newer generation multiprocessors configured as hypercubes use wormhole-like routing. For example, the Intel iPSCB60 hypercube uses a circuit-switched routing. Both the Delta system by Intel [ 1 I] and the MIT J-machine by Dally [2] use wormhole routing. Another orthogonal issue of routing is how to select routing paths. Most hypercube machines use e-cube routing [21], i.e., route through the dimensions of the correct bits in an ascending order. E- cube routing has attained this popularity because it is a simple and effective way of guaranteeing freedom from deadlock without using complicated virtual channels. Efficient broadcast algorithms based on n edge-disjoint spanning trees on an n-cube have been given [8], [IO], [14]. McKinley and Trefftz [I51 gave a broadcast algorithm of [n/21 routing steps in an n-cube with wormhole, e-cube routing and all-port communication. Recently, multicast algorithms on hypercubes under the same model were given in [20]. Efficient broadcast algorithms on two-and higher-dimensional tori were given in [18], [I91 under the similar model, namely all-port and wormhole routing, except that their routings were not restricted to the dimension-ordered routing. In Section 111, we give an algorithm that can broadcast a message in O(n/log,(n+ 1))r outing steps in the same model as considered by McKinley and Trefftz [ 151. We show that our time bound is optimal to Manuscript received July 22, 1993; revised January 11, 1994. This work was supported in part by NSF Grant CCR-9101385C.-T. C.-T. Ho is with IBM Research Division, Almaden Research Center, San Jose, CA 95120 USA (e-mail: ho@almaden.ibm.com). M.-Y. Kao is with the Computer Science Department, Duke University, Durham, NC 27706 USA (e-mail: kao@cs.duke.edu). IEEE Log Number 9406342.",
year = "1995",
month = feb,
doi = "10.1109/71.342134",
language = "English (US)",
volume = "6",
pages = "200--204",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "2",
}