Report Number: CSL-TR-76-126
Institution: Stanford University, Computer Systems Laboratory
Title: The optimal placement of dynamic recovery checkpoints in recoverable computer systems
Author: Warren-Angelucci, Wayne
Date: December 1976
Abstract: Reliability is an important concern of any computer system. No matter how carefully designed and constructed, computer systems fail. The rapid and systematic restoration of service after an error or malfunction is always a major design and operational goal. In order to overcome the effects of a failure, recovery must be performed to go from the failed sate to an operational state. This thesis describes a recovery method which guarantees that a computer system, its associated data bases and communication transactions will be restored to an operational and consistent state within a given time and cost bound after the occurrence of a system failure. This thesis considers the optimization of a specific software strategy - the rollback and recovery strategy, within the framework of a graph model of program flow which encompasses communication interfaces and data base transactions. Algorithms are developed which optimize the placement of dynamic recovery checkpoints. Presented is a method for statically pre-computing a set of optimal decision parameters for the associated program model, and run-time technique for dynamically determining the optimal placement of program recovery checkpoints.