Due by 5pm, Thursday, February 26, 1998.
4 problems, equal credit. 100 pts.

Problem 1:Pipelining Vs. Materialization

In this problem, we shall consider a very specific situation, in which we may have an opportunity to pipeline the result on one operation to another. There are three relations R, S, and T, whose sizes are 2000, 2000, and 1000 blocks, respectively (the number of tuples is irrelvant for this problem). We wish to compute (R JOIN S) JOIN T, and we are going to use a hash join in both cases. There are 101 buffers available; i.e., we can keep only 101 blocks in main memory at any time. We estimate that the number of blocks occupied by the result of the first join R JOIN S is B (an unknown number of blocks). We wish to consider two strategies:
  1. No pipelining:
    1. Hash R using all 100 buffers (i.e., 100 buckets), and store the result on disk.
    2. Hash S using the 100 buffers and store the result on disk.
    3. Read corresponding pairs of buckets, join them, and store the result on disk.
    4. Hash R JOIN S using the 100 buffers and store the result on disk; this result will require B blocks.
    5. Hash T using the 100 buffers, and store the result on disk.
    6. Read corresponding pairs of buckets from R JOIN S and from T, and output the result.
  2. With pipelining of the result of R JOIN S:
    1. Hash R using all 100 buffers (i.e., 100 buckets), and store the result on disk.
    2. Hash S using the 100 buffers and store the result on disk.
    3. Read corresponding pairs of buckets, join them, and, immediately hash the resulting output tuples into buckets for R JOIN S (you need to figure out how many buckets can be used, assuming all buckets of all hash tables occupy exactly the expected number of blocks). These buckets are eventually stored on disk.
    4. Hash T into the same number of buckets as R JOIN S and store the result on disk.
    5. Read corresponding pairs of buckets from R JOIN S and from T, and output the result.

In each case, the hash joins are performed by repeatedly bringing in memory two whole buckets for the two relations. Note that there is a method with similar performance that doesn't require two buckets to fit in main memory (as in HW #4 prob. 1) but for this problem you can ignore it.

Your question is to compare these two methods as a function of B. First, assuming that the method is feasible, i.e., the average size of two buckets (one from each argument) does not consume more buffers than are available, how many disk I/O's are performed by each method, exclusive of the writing of the final result, whose size we do not know? Second, in order to stay within the limit of 100 buffers, there is an implied limit on how large B can be. What is that limit for each of the methods? What is your conclusion about those values of B that make pipelining preferable?

Problem 2:Query optimization

Consider the following SQL query about relations R(A,B), S(B,C,D), and T(E,F).

	select A, sum(D) 
	from R natural join S 
	where exists 
	  (select * from T where D > E) 
	and exists 
	  (select * from T where A < F)
        group by A;
Give the intuitively best query plan (represented as a tree, as in the class notes) for the above query. Briefly, explain you reasoning.

Note that in this problem we are not concerned with the specific methods used to implement the algebraic operators (join, selection, etc.)

Hint: You can change the exists conditions to something simpler.

Problem 3:Undo and Redo Logging

In the table below you are given the contents of the log and the state of (a part of) the database after several system crashes. For each case, indicate what type of logging might have been used.

60[Start T],[T,A,5],[Commit T]
66[Start T],[T,A,6],[T,B,6],[Commit T]
65[Start T],[T,A,6],[T,B,6],[Commit T]
55[Start T],[T,A,6],[T,B,6]
55[Start T],[T,A,6],[T,B,6],[Abort T]

Your answer, for each case, should be one of the following:

Briefly, explain you reasoning.

Problem 4: Logging and Recovery

This problem deals with the recovery mechanism of a hypothetical DBMS. In this database system, permanent storage (e.g., disk space) is divided into fixed size blocks. The size of a block is the same as a main memory page. Blocks are identified with a block number. Similarly, pages in memory are identified by a page number. Blocks and pages are subdivided into fixed sized records. Assume that there are R records per block (and per page).

In this system, transactions are given unique identification numbers as they arrive from the users. When transaction Ti (that is, the transaction with identification number = Ti) wishes to update record r in block b in the database, Ti reads in the block b into a main memory buffer p. (Unless, of course, the block was already in main memory). (Ti would also request a lock for b, but you can ignore locks and concurrency control in this assignment.) The modification to record r is then performed in main memory. The modified page is *not* immediately written back onto block b. Instead, the 4-tuple (p, b, r, v) is added to list L( Ti ), where v is the new value for record r. (Page p is not released by Ti .) The changes to the database are written out to permanent storage only when Ti completes. At that time, all the pages indicated in L( Ti ) are written to their corresponding blocks and the main memory pages are released (i.e., they can be used by some other transaction).

Now lets study the procedure for writing out the pages when Ti completes. The following procedure does *not* work:

Procedure Simple-Write ( Ti, L( Ti ) );
    For all (p, b, r, v) in  L( Ti ) do Output( p -> b );
Why isn't this procedure good? Consider what happens if the system crashes in the middle of this procedure. Some pages of Ti will have been written out, while other may not have been. In other words, the database is left in an inconsistent state, and there is no way to recover.

The solution is of course to use a log. This system uses a peculiar type of log that is fixed size. The log (in this system) consists of permanent storage blocks a(1) , a(2) , ... , a(n).

A bit array in main memory, MAP( j ), indicates which of these blocks are in use (i.e., MAP( j ) = 1 if a(j) is in use.) Before writing out the pages of transaction Ti, the system writes a recovery block into the log. This block records all the changes to be done by Ti. Here is the procedure for writing out the pages of transaction Ti: (P( j ) is the j th record in page P; P( j ) <--- [a, b, c, d] means that the four values a, b, c and d are stored in the j th record of P.)

Procedure Recoverable-Write ( Ti , L( Ti ) );
    << Construct recovery page in main memory >>
      px <- a free page in main memory
      px( 1 ) <- Ti ;   k <- 0;
      For all (p, b, r, v) in L( Ti ) do
          Begin px(k + 3) <- [p, b, r, v];   k <-  k + 1  End;
      px( 2 ) <- k;
      << Now write recovery page to log. >>
      k <-  position of first 0 bit in MAP.
    << Note:  Assume that MAP always has some 0 bit. >>
      MAP( k ) <- 1
      Output( px -> a(k) )
    << Finally, write out pages of Ti >>
      Do Simple-Write (Ti , L( Ti ));
    << Log record is not needed any longer >>
      px <- all zeros;  Output( px -> a(k) );
      MAP( k ) <- 0;   Release page px ;
Procedure Recoverable-Write assumes that a single block is large enough to contain all of the recovery information for one transaction. Let us say that this is an acceptable assumption for this problem. Also, it only protects against simple crashes, as discussed in class. These crashes do not affect the permanent storage but anything in main memory is lost. Also, let us assume that a write operation to permanent store (e.g., Output( p -> b ) ) always completes correctly. That is, if the system crashes during a write to disk, either the write is not performed at all or the write is completed successfully.

After this ``brief'' introduction, we can state the problem:

Write a procedure which recovers a consistent database from a crash. That is, the DBMS has halted at some undetermined point and the contents of main memory (including MAP) have been lost. (Permanent storage is not affected). Your procedure is to read the log and finish the writes that were in progress at the time of the crash.

It is important that your procedure be able to recover from crashes itself. In other words, if your procedure is interrupted and the contents of main memory lost once again, it should be possible to recover a consistent database by simply restarting your procedure at the beginning.

Write your procedure in the style illustrated above. Be sure to write plenty of comments to let me know how and why it works. Clarity is as important as correctness, if I can't understand your code, I can't verify it's doing the right thing. Write any additional assumptions that you make. (By the way, the command Input( b -> p ) reads block b into page p in main memory.) You may change procedure Recoverable-Write if you absolutely need to.

Last modified: Wed Feb 18 15:50:24 PST 1998