CS245  Winter 1999

Assignment 6
due in class on Thursday February 25


This problem deals with the recovery mechanism of a hypothetical DBMS.  In this database
system, permanent storage (e.g., disk space) is divided into fixed size blocks.  The
size of a block is the same as a main memory page.  Blocks are identified with a block
number.  Similarly, pages in memory are identified by a page numbe p.  Blocks and pages
are subdivided into fixed sized records.  Assume that there are R records per block (and
per page). 

In this system, transactions are given unique identification numbers as they arrive from
the users.  When transaction Ti (that is, the transaction with identification number =
Ti) wishes to update record r in block b in the database, Ti reads in the block b into a
main memory buffer p.(Unless, of course, the block was already in main memory). 

(Ti would also request a lock for b, but you can ignore locks and concurrency control in
this assignment.) 

The modification to record r is then performed in main memory.  The modified page is
*not* immediately written back onto block b.  Instead, the 4-tuple (p, b, r, v) is added
to list L( Ti ), where v is the new value for record r.  (Page p is not released by Ti
.)  The changes to the database are written out to permanent storage only when Ti
completes.  At that time, all the pages indicated in L( Ti ) are written to their
corresponding blocks and the main memory pages are released (i.e., they can be used by
some oth er transaction). 

Now lets study the procedure for writing out the pages when Ti completes.  The following
procedure does *not* work: 

Procedure Simple-Write ( Ti, L( Ti ) ); 
    For all (p, b, r, v) in  L( Ti ) do Output( p -> b );

Why isn't this procedure good?  Consider what happens if the system crashes in the
middle of this procedure.  Some pages of Ti will have been written out, while other may
not have been.  In other words, the database is left in an inconsistent state, and there
is no way to recover. 

The solution is of course to use a log.  This system uses a peculiar type of log that is
fixed size.  The log (in this system) consists of permanent storage blocks a(1) , a(2) ,
... , a(n).  A bit array in main memory, MAP( j ), indicates which of these blocks are
in use (i.e., MAP( j ) = 1 if a(j) is in use.)  Before writing out the pages of
transaction Ti, the system writes a recovery block into the log.  This block records all
the changes to be done by Ti.  Here is the procedure for writing out the pages of
transaction Ti:  (P( j ) is the j th record in page j; P( j ) < - [a, b, c, d] means
that the four values a, b, c and d are stored in the j th record of P.) 

Procedure Recoverable-Write ( Ti , L( Ti ) );
    << Construct recovery page in main memory >>
      px <- a free page in main memory
      px( 1 ) <- Ti ;   k <- 0;
      For all (p, b, r, v) in L( Ti ) do
          Begin px(k + 3) <- [p, b, r, v];   k <-  k + 1  End;
      px( 2 ) <- k;
    << Now write recovery page to log. >>
      k <-  position of first 0 bit in MAP.
    << Note:  Assume that MAP always has some 0 bit. >>
      MAP( k ) <- 1
      Output( px -> a(k) )
    << Finally, write out pages of Ti >>
      Do Simple-Write (Ti , L( Ti ));
    << Log record is not needed any longer >>
      px <- all zeros;  Output( px -> a(k) );
      MAP( k ) <- 0;   Release page px ;

Procedure Recoverable-Write assumes that a single block is large enough to contain all
of the recovery information for one transaction.  Let us say that this is an acceptable
assumption for this problem.  Also, it only protects against simple crashes, as
discussed in class.  These crashes do not affect the permanent storage but anything in
main memory is lost.  Also, let us assume that a write operation to permanent store
(e.g., Output( p -> b ) ) always completes correctly.  That is, if the system crashes
during a write to disk, either the write is not performed at all or the write is
completed successfully. 

After this ``brief'' introduction, we can state the problem:  Write a procedure which
recovers a consistent database from a crash.  That is, the DBMS has halted at some
undetermined point and the contents of main memory (including MAP) have been lost. 
(Permanent storage is not affected).  Your procedure is to read the log and finish the
writes that were in progress at the time of the crash. 

It is important that your procedure be able to recover from crashes itself.  In other
words, if your procedure is interrupted and the contents of main memory lost once again,
it should be possible to recover a consistent database by simply restarting your
procedure at the beginning. 

Write your procedure in the style illustrated above.  Be sure to write plenty of
comments to let me know how and why it works.  Write any additional assumptions that you
make.  (By the way, the command Input( b -> p ) reads block b into page p in main
memory.)  You may change procedure Recoverable-Write if you absolutely need to. 

(Extra Credit:  What can be done if the recovery information for some
transactions is larger than a block?)


Consider the following transaction log from the start of the run of a database
system that is capable of running undo/redo logging with checkpointing:

1)  <START T1>
2)  <T1, A, 50, 25>
3)  <T1, B, 250, 25>
4)  <START T2>
5)  <T1, A, 75, 50>
6)  <T2, C, 35, 25>
7)  <COMMIT T1>
8)  <START T3>
9)  <T3, E, 55, 25>
10) <T2, D, 45, 25>
11) <START CKPT (T2,T3)>
12) <T2, C, 65, 35>
13) <COMMIT T2>
14) <START T4>
15) <T4, F, 100, 25>
16) <COMMIT T3>
17) <END CKPT>
18) <T4, F, 150, 100>
19) <COMMIT T4>

Assume the log entrys are in the format <Tid, Variable, New value, Old value>
What is the value of the data items A, B, C, D, E, and F on disk after recovery:

(i) if the system crashes just before line 10 is written to disk?
(ii) if the system crashes just before line 13 is written to disk?
(iii) if the system crashes just before line 14 is written to disk?
(iv) if the system crashes just before line 19 is written to disk?
(v) if the system crashes just after line 19 is written to disk?