|CS245A PROBLEM SET #5|
Due by 5pm, Thursday, February 26, 1998.
4 problems, equal credit. 100 pts.
In each case, the hash joins are performed by repeatedly bringing in memory two whole buckets for the two relations. Note that there is a method with similar performance that doesn't require two buckets to fit in main memory (as in HW #4 prob. 1) but for this problem you can ignore it.
Your question is to compare these two methods as a function of B. First, assuming that the method is feasible, i.e., the average size of two buckets (one from each argument) does not consume more buffers than are available, how many disk I/O's are performed by each method, exclusive of the writing of the final result, whose size we do not know? Second, in order to stay within the limit of 100 buffers, there is an implied limit on how large B can be. What is that limit for each of the methods? What is your conclusion about those values of B that make pipelining preferable?
select A, sum(D)
from R natural join S
(select * from T where D > E)
(select * from T where A < F)
group by A;
Give the intuitively best query plan (represented as a tree, as
in the class notes) for the above query. Briefly,
explain you reasoning.
Note that in this problem we are not concerned with the specific methods used to implement the algebraic operators (join, selection, etc.)
Hint: You can change the exists conditions to something simpler.
|6||0||[Start T],[T,A,5],[Commit T]|
|6||6||[Start T],[T,A,6],[T,B,6],[Commit T]|
|6||5||[Start T],[T,A,6],[T,B,6],[Commit T]|
|5||5||[Start T],[T,A,6],[T,B,6],[Abort T]|
Your answer, for each case, should be one of the following:
Briefly, explain you reasoning.
In this system, transactions are given unique identification numbers as they arrive from the users. When transaction Ti (that is, the transaction with identification number = Ti) wishes to update record r in block b in the database, Ti reads in the block b into a main memory buffer p. (Unless, of course, the block was already in main memory). (Ti would also request a lock for b, but you can ignore locks and concurrency control in this assignment.) The modification to record r is then performed in main memory. The modified page is *not* immediately written back onto block b. Instead, the 4-tuple (p, b, r, v) is added to list L( Ti ), where v is the new value for record r. (Page p is not released by Ti .) The changes to the database are written out to permanent storage only when Ti completes. At that time, all the pages indicated in L( Ti ) are written to their corresponding blocks and the main memory pages are released (i.e., they can be used by some other transaction).
Now lets study the procedure for writing out the pages when Ti completes. The following procedure does *not* work:
Procedure Simple-Write ( Ti, L( Ti ) ); Begin For all (p, b, r, v) in L( Ti ) do Output( p -> b ); End;Why isn't this procedure good? Consider what happens if the system crashes in the middle of this procedure. Some pages of Ti will have been written out, while other may not have been. In other words, the database is left in an inconsistent state, and there is no way to recover.
The solution is of course to use a log. This system uses a peculiar type of log that is fixed size. The log (in this system) consists of permanent storage blocks a(1) , a(2) , ... , a(n).
A bit array in main memory, MAP( j ), indicates which of these blocks are in use (i.e., MAP( j ) = 1 if a(j) is in use.) Before writing out the pages of transaction Ti, the system writes a recovery block into the log. This block records all the changes to be done by Ti. Here is the procedure for writing out the pages of transaction Ti: (P( j ) is the j th record in page P; P( j ) <--- [a, b, c, d] means that the four values a, b, c and d are stored in the j th record of P.)
Procedure Recoverable-Write ( Ti , L( Ti ) ); Begin << Construct recovery page in main memory >> px <- a free page in main memory px( 1 ) <- Ti ; k <- 0; For all (p, b, r, v) in L( Ti ) do Begin px(k + 3) <- [p, b, r, v]; k <- k + 1 End; px( 2 ) <- k; << Now write recovery page to log. >> k <- position of first 0 bit in MAP. << Note: Assume that MAP always has some 0 bit. >> MAP( k ) <- 1 Output( px -> a(k) ) << Finally, write out pages of Ti >> Do Simple-Write (Ti , L( Ti )); << Log record is not needed any longer >> px <- all zeros; Output( px -> a(k) ); MAP( k ) <- 0; Release page px ; End;Procedure Recoverable-Write assumes that a single block is large enough to contain all of the recovery information for one transaction. Let us say that this is an acceptable assumption for this problem. Also, it only protects against simple crashes, as discussed in class. These crashes do not affect the permanent storage but anything in main memory is lost. Also, let us assume that a write operation to permanent store (e.g., Output( p -> b ) ) always completes correctly. That is, if the system crashes during a write to disk, either the write is not performed at all or the write is completed successfully.
After this ``brief'' introduction, we can state the problem:
Write a procedure which recovers a consistent database from a crash. That is, the DBMS has halted at some undetermined point and the contents of main memory (including MAP) have been lost. (Permanent storage is not affected). Your procedure is to read the log and finish the writes that were in progress at the time of the crash.
It is important that your procedure be able to recover from crashes itself. In other words, if your procedure is interrupted and the contents of main memory lost once again, it should be possible to recover a consistent database by simply restarting your procedure at the beginning.
Write your procedure in the style illustrated above. Be sure to write plenty of comments to let me know how and why it works. Clarity is as important as correctness, if I can't understand your code, I can't verify it's doing the right thing. Write any additional assumptions that you make. (By the way, the command Input( b -> p ) reads block b into page p in main memory.) You may change procedure Recoverable-Write if you absolutely need to.