Loading...
Please wait, while we are loading the content...
Similar Documents
Byzantine anomaly testing for charm++: providing fault tolerance and survivability for charm++ empowered clusters (2006).
| Content Provider | CiteSeerX |
|---|---|
| Author | Mogilevsky, Dmitry Koenig, Gregory A. Yurcik, William |
| Abstract | Recently shifts in high-performance computing have increased the use of clusters built around cheap commodity processors. A typical cluster consists of individual nodes, containing one or several processors, connected together with a highbandwidth, low-latency interconnect. There are many benefits to using clusters for computation, but also some drawbacks, including a tendency to exhibit low Mean Time To Failure (MTTF) due to the sheer number of components involved. Recently, a number of fault-tolerance techniques have been proposed and developed to mitigate the inherent unreliability of clusters. These techniques, however, fail to address the issue of detecting non-obvious faults, particularly Byzantine faults. At present, effectively detecting Byzantine faults is an open problem. We describe the operation of ByzwATCh, a module for run-time detecting Byzantine hardware errors as part of the Charm++ parallel programming framework. |
| File Format | |
| Publisher Date | 2006-01-01 |
| Access Restriction | Open |
| Subject Keyword | Byzantine Anomaly Testing Charm Empowered Cluster Fault Tolerance Byzantine Fault Many Benefit Open Problem Individual Node Typical Cluster Non-obvious Fault Several Processor Cheap Commodity Processor Charm Parallel High-performance Computing Low-latency Interconnect Inherent Unreliability Low Mean Time Sheer Number Fault-tolerance Technique Run-time Detecting Byzantine Hardware Error |
| Content Type | Text |