NDLI: Software fault tolerance in distributed systems using controlled re-execution

Please wait, while we are loading the content...

Software fault tolerance in distributed systems using controlled re-execution

Content Provider	Semantic Scholar
Author	Tarafdar, Ashis Garg, Vijay K.
Copyright Year	2000
Abstract	Distributed applications are particularly vulnerable to synchronization faults. An important approach to tolerating synchronization faults is rollback recovery, which involves restoring a previous state and re-executing. Existing rollback recovery methods depend on chance and cannot guarantee that synchronization faults do not recur during re-execution. We propose a new rollback recovery method, controlled re-execution, based on selectively adding synchronizations during re-execution to ensure that synchronization faults do not recur. The controlled re-execution method gives rise to three interesting questions: How do we determine the synchronizations that ensure a safe re-execution? How do we monitor an application to detect faulty global conditions? How well does controlled re-execution perform in practice? The first part of the dissertation addresses the predicate control problem which takes a computation and a global property and adds synchronizations to the computation to maintain the property. We design efficient algorithms to solve the problem for many useful predicates, including disjunctive predicates and various types of mutual exclusion predicates. These predicates correspond to commonly encountered synchronization faults such as races. The second part of the dissertation investigates the predicate detection problem which involves determining whether a global property occurs in a computation. We address the problem for the useful class of conjunctive predicates and in the context of an extended model of computation that allows improved predicate detection over the conventional model. We show that, in general, the problem is NP-Complete. However, an efficient solution is demonstrated for the useful cases of receive-ordered and send-ordered computations. Further, this solution can be used to achieve an improved, though exponential, solution for the general problem. The third part of the dissertation involves an experimental study of the controlled re-execution method. We evaluate the controlled re-execution method in tolerating race faults and find that the extra tracing costs imposed are within tolerable limits and that it greatly enhanced the likelihood of recovery. We conclude that controlled re-execution is an effective and desirable method for tolerating races in long-running non-interactive distributed applications.
File Format	PDF HTM / HTML
Alternate Webpage(s)	http://www.cs.utexas.edu/ftp/techreports/tr00-38.pdf
Language	English
Access Restriction	Open
Content Type	Text
Resource Type	Article

Central Library (ISO-9001:2015 Certified)
Indian Institute of Technology Kharagpur
Kharagpur, West Bengal, India | PIN - 721302

See location in the Map
03222 282435
Mail: support@ndl.gov.in

Sl.	Authority	Responsibilities	Communication Details
1	Ministry of Education (GoI), Department of Higher Education	Sanctioning Authority	https://www.education.gov.in/ict-initiatives
2	Indian Institute of Technology Kharagpur	Host Institute of the Project: The host institute of the project is responsible for providing infrastructure support and hosting the project	https://www.iitkgp.ac.in
3	National Digital Library of India Office, Indian Institute of Technology Kharagpur	The administrative and infrastructural headquarters of the project	Dr. B. Sutradhar bsutra@ndl.gov.in
4	Project PI / Joint PI	Principal Investigator and Joint Principal Investigators of the project	Dr. B. Sutradhar bsutra@ndl.gov.in Prof. Saswat Chakrabarti will be added soon
5	Website/Portal (Helpdesk)	Queries regarding NDLI and its services	support@ndl.gov.in
6	Contents and Copyright Issues	Queries related to content curation and copyright issues	content@ndl.gov.in
7	National Digital Library of India Club (NDLI Club)	Queries related to NDLI Club formation, support, user awareness program, seminar/symposium, collaboration, social media, promotion, and outreach	clubsupport@ndl.gov.in
8	Digital Preservation Centre (DPC)	Assistance with digitizing and archiving copyright-free printed books	dpc@ndl.gov.in
9	IDR Setup or Support	Queries related to establishment and support of Institutional Digital Repository (IDR) and IDR workshops	idr@ndl.gov.in