Loading...
Please wait, while we are loading the content...
An Efficient Fault-Tolerant Routing Methodology for Direct Interconnection Networks
| Content Provider | Semantic Scholar |
|---|---|
| Author | Gómez, María Engracia Robles, Antonio Nordbotten, Nils Agne Marin, J. F. D. Skeie, Tor Flich, Jose López, Pedro Lysne, Olav |
| Copyright Year | 2004 |
| Abstract | Nowadays, massively parallel computing systems are being built with thousands of nodes. This huge number of nodes significantly affects the probability of failure. Thus, it is critical to keep these systems running even in the presence of failures. The interconnection network plays a key role in the performance achieved by these systems, since failures in the interconnection network may isolate a large fraction of the machine containing many healthy nodes. In this paper we present a methodology to design fault-tolerant routing algorithms for regular direct interconnection networks. It supports fully adaptive routing, does not degrade performance in the absence of faults, and supports a reasonably large number of faults without significantly degrading performance. The methodology is mainly based on the selection of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, at this node, without being ejected, they are adaptively forwarded to their destination. In order to allow deadlock-free minimal adaptive routing, the methodology requires only one additional virtual channel (for a total of three), even for tori. Evaluation results for a 4×4×4 torus network show that the methodology is 5-fault tolerant. Indeed, for up to 14 link failures, the percentage of fault combinations supported is higher than 91.3%. Additionally, network throughput degrades by less than 10% when injecting three random link faults without disabling any node. In contrast, a mechanism similar to the one proposed in the BlueGene/L, that disables some network planes, would strongly degrade network throughput by 79%. |
| Starting Page | 283 |
| Ending Page | 288 |
| Page Count | 6 |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://www.disca.upv.es/jflich/papers/jorpar04a.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |