BIB-VERSION:: CS-TR-v2.0 ID:: STAN//CSL-TR-98-756 ENTRY:: August 13, 1998 ORGANIZATION:: Stanford University, Computer Systems Laboratory TITLE:: Hardware-assisted Algorithms for Checkpoints TYPE:: Technical Report AUTHOR:: Sunada, Dwight AUTHOR:: Glasco, David AUTHOR:: Flynn, Michael DATE:: July 1998 PAGES:: 42 ABSTRACT:: We can classify the algorithms for establishing checkpoints on distributed-shared-memory multiprocessors DSMMs into 3 broad classes: tightly synchronized method TSM loosely synchronized method LSM, unsynchronized method USM. TSM-type algorithms force the immediate establishment of a checkpoint whenever a dependency between 2 processors arises. LSM-type algorithms record this dependency and, hence, do not require the immediate establishment of a checkpoint if a dependency does arise; when a processor chooses to establish a checkpoint, the processor will query the dependency records to determine other processors that must also establish a checkpoint. USM-type algorithms allow a processor to establish a checkpoint without regard to any other processor. Within this framework, we developed 4 hardware-based algorithms: distributed recoverable shared memory (DRSM), DRSM for communication checkpoints (DRSM-C), DRSM with a hybrid method (DRSM-H), and DRSM with logs (DRSM-L). DRSM-C is a TSM-type algorithm, and DRSM and DRSM-H are LSM-type algorithms. DRSM-L is a USM-type algorithm and is the first of its kind for a tightly-coupled DSMM where hardware in the form of a directory maintains cache coherence. We find that DRSM has the best performance in terms of minimizing the impact of establishing checkpoints (or logs) on the running applications, but DRSM along with DRSM-C has the most expensive hardware requirements. DRSM-L has the second best performance but has the least expensive hardware requirement. We conclude that DRSM-L is the best algorithm in terms of cost and performance. NOTES:: [Adminitrivia V1/Prg/19971027] END:: STAN//CSL-TR-98-756