Loading...
Please wait, while we are loading the content...
Similar Documents
Design of fault tolerant pwrake workflow system supported by gfarm file system
| Content Provider | ACM Digital Library |
|---|---|
| Author | Tatebe, Osamu Tanaka, Masahiro |
| Abstract | We have been developing a light-weight workflow system called Pwrake to execute data-intensive many-task workflows with the help of high-performance parallel I/O of Gfarm file system. This paper discusses the design of fault tolerance mechanism implemented in Pwrake. To avoid a workflow abort in the occurrence of a worker node failure, Pwrake detects a node failure based on the result of a task retry. To avoid loss of files when a worker node fails, we make use of automatic file replication of Gfarm file system. To resume an interrupted workflow correctly, we introduce a Pwrake option to rename or remove an output file of a failed task. In the experiment, we confirmed that the overhead of Gfarm automatic file replication in workflow execution time is less than 10%, and that workflow continues and returns right results even after the occurrence of an artificial failure in a worker node. |
| Starting Page | 7 |
| Ending Page | 12 |
| Page Count | 6 |
| ISBN | 9781509052127 |
| DOI | 10.1109/MTAGS.2016.7 |
| Language | English |
| Publisher | Association for Computing Machinery (ACM) |
| Publisher Date | 2016-11-13 |
| Access Restriction | Subscribed |
| Subject Keyword | Fault tolerance Many-task computing Scientific workflow system Distributed file system |
| Content Type | Text |
| Resource Type | Article |