Loading...
Please wait, while we are loading the content...
Similar Documents
Checkpointing Shared Memory Programs at the Application-Level (2004)
| Content Provider | CiteSeerX |
|---|---|
| Author | Bronevetsky, Greg Marques, Daniel Pingali, Keshav Szwed, Peter Schulz, Martin |
| Description | Trends in high-performance computing are making it necessary for long-running applications to tolerate hardware faults. The most commonly used approach is checkpoint and restart (CPR) - the state of the computation is saved periodically on disk, and when a failure occurs, the computation is restarted from the last saved state. At present, it is the responsibility of the programmer to instrument applications for CPR. |
| File Format | |
| Language | English |
| Publisher Date | 2004-01-01 |
| Publisher Institution | In European Workshop on OpenMP |
| Access Restriction | Open |
| Subject Keyword | Hardware Fault Shared Memory Program High-performance Computing Long-running Application Last Saved State Used Approach |
| Content Type | Text |
| Resource Type | Article |