HW #4

CS245 PROBLEM SET #4
Due by 5:30pm, Tuesday, August 10, 1999

Problem 1: Undo and Redo Logging (20 points)

In the table below you are given the contents of the log and the state of (a part of) the database after several system crashes. For each case, indicate what type of logging might have been used.

case A B Log

1 6 0 [Start T],[T,A,5],[Commit T]

2 6 5 [Start T],[T,A,6],[T,B,6],[Commit T]

3 6 6 [Start T],[T,A,6],[T,B,6],[Commit T]

4 5 5 [Start T],[T,A,6],[T,B,6]

5 5 5 [Start T],[T,A,6],[T,B,6],[Abort T]

case	A	B	Log
1	6	0	[Start T],[T,A,5],[Commit T]
2	6	5	[Start T],[T,A,6],[T,B,6],[Commit T]
3	6	6	[Start T],[T,A,6],[T,B,6],[Commit T]
4	5	5	[Start T],[T,A,6],[T,B,6]
5	5	5	[Start T],[T,A,6],[T,B,6],[Abort T]

Your answer, for each case, should be one of the following:

Redo
Undo
Either
Neither

Briefly, explain you reasoning.

Problem 2: Undo/Redo Logging (20 points)

Consider the following transaction log from the start of the run of a database system that is capable of running undo/redo logging with checkpointing:

1)  <START T1>
2)  <T1, A, 50, 25>
3)  <T1, B, 250, 25>
4)  <START T2>
5)  <T1, A, 75, 50>
6)  <T2, C, 55, 25>
7)  <COMMIT T1>
8)  <START T3>
9)  <T3, E, 65, 25>
10) <T2, D, 35, 25>
11) <START CKPT (T2,T3)>
12) <T2, C, 45, 55>
13) <COMMIT T2>
14) <START T4>
15) <T4, F, 120, 25>
16) <COMMIT T3>
17) <END CKPT>
18) <T4, F, 150, 120>
19) <COMMIT T4>

Assume the log entrys are in the format <Tid, Variable, New value, Old value> What is the value of the data items A, B, C, D, E, and F on disk after recovery:

1.    if the system crashes just before line 10 is written to disk?
2.    if the system crashes just before line 13 is written to disk?
3.    if the system crashes just before line 14 is written to disk?
4.    if the system crashes just before line 19 is written to disk?
5.    if the system crashes just after line 19 is written to disk?

Problem 3: Logging and Recovery (40 points)

This problem deals with the recovery mechanism of a hypothetical DBMS. In this database system, permanent storage (e.g., disk space) is divided into fixed size blocks. The size of a block is the same as a main memory page. Blocks are identified with a block number. Similarly, pages in memory are identified by a page number. Blocks and pages are subdivided into fixed sized records. Assume that there are R records per block (and per page).

In this system, transactions are given unique identification numbers as they arrive from the users. When transaction Ti (that is, the transaction with identification number = Ti) wishes to update record r in block b in the database, Ti reads in the block b into a main memory buffer p. (Unless, of course, the block was already in main memory). (Ti would also request a lock for b, but you can ignore locks and concurrency control in this assignment.) The modification to record r is then performed in main memory. The modified page is *not* immediately written back onto block b. Instead, the 4-tuple (p, b, r, v) is added to list L( Ti ), where v is the new value for record r. (Page p is not released by Ti .) The changes to the database are written out to permanent storage only when Ti completes. At that time, all the pages indicated in L( Ti ) are written to their corresponding blocks and the main memory pages are released (i.e., they can be used by some other transaction).

Now lets study the procedure for writing out the pages when Ti completes. The following procedure does *not* work:

Procedure Simple-Write ( Ti, L( Ti ) );
  Begin
    For all (p, b, r, v) in  L( Ti ) do Output( p -> b );
  End;

Why isn't this procedure good? Consider what happens if the system crashes in the middle of this procedure. Some pages of Ti will have been written out, while other may not have been. In other words, the database is left in an inconsistent state, and there is no way to recover.

The solution is of course to use a log. This system uses a peculiar type of log that is fixed size. The log (in this system) consists of permanent storage blocks a(1) , a(2) , ... , a(n).

A bit array in main memory, MAP( j ), indicates which of these blocks are in use (i.e., MAP( j ) = 1 if a(j) is in use.) Before writing out the pages of transaction Ti, the system writes a recovery block into the log. This block records all the changes to be done by Ti. Here is the procedure for writing out the pages of transaction Ti: (P( j ) is the j th record in page P; P( j ) <--- [a, b, c, d] means that the four values a, b, c and d are stored in the j th record of P.)

Procedure Recoverable-Write ( Ti , L( Ti ) );
  Begin
    << Construct recovery page in main memory >>
      px <- a free page in main memory
      px( 1 ) <- Ti ;   k <- 0;
      For all (p, b, r, v) in L( Ti ) do
          Begin px(k + 3) <- [p, b, r, v];   k <-  k + 1  End;
      px( 2 ) <- k;
      << Now write recovery page to log. >>
      k <-  position of first 0 bit in MAP.
    << Note:  Assume that MAP always has some 0 bit. >>
      MAP( k ) <- 1
      Output( px -> a(k) )
    << Finally, write out pages of Ti >>
      Do Simple-Write (Ti , L( Ti ));
    << Log record is not needed any longer >>
      px <- all zeros;  Output( px -> a(k) );
      MAP( k ) <- 0;   Release page px ;
  End;

Procedure Recoverable-Write assumes that a single block is large enough to contain all of the recovery information for one transaction. Let us say that this is an acceptable assumption for this problem. Also, it only protects against simple crashes, as discussed in class. These crashes do not affect the permanent storage but anything in main memory is lost. Also, let us assume that a write operation to permanent store (e.g., Output( p -> b ) ) always completes correctly. That is, if the system crashes during a write to disk, either the write is not performed at all or the write is completed successfully.

After this ``brief'' introduction, we can state the problem: Write a procedure which recovers a consistent database from a crash. That is, the DBMS has halted at some undetermined point and the contents of main memory (including MAP) have been lost. (Permanent storage is not affected). Your procedure is to read the log and finish the writes that were in progress at the time of the crash.

It is important that your procedure be able to recover from crashes itself. In other words, if your procedure is interrupted and the contents of main memory lost once again, it should be possible to recover a consistent database by simply restarting your procedure at the beginning.

Write your procedure in the style illustrated above. Be sure to write plenty of comments to let me know how and why it works. Clarity is as important as correctness, if I can't understand your code, I can't verify it's doing the right thing. Write any additional assumptions that you make. (By the way, the command Input( b -> p ) reads block b into page p in main memory.) You may change procedure Recoverable-Write if you absolutely need to.