Final Exam Solutions

CS245A Solutions to Final Exam

B. The number of tracks is 100,000, so there are 10^10/10^5 = 100,000 bytes per track. Of the 5 inches of track, 4 inches is used to store data, since 20% is gap. Thus, the 800,000 bits from 100,000 bytes occupies 4 inches, or 200,000 bits per inch.
B. It takes 2000 blocks to store 20,000 distinct tuples at 10 tuples per block.
B. The worst thing that could happen is that 999 of the buckets contain exactly one tuple. These buckets would use 999 blocks. The remaining 999,001 tuples would go in the last bucket, occupying 9991 blocks, for a total of 10,990 blocks. Note that any other distribution in which all the buckets have a number of tuples that is congruent to 1 modulo 100 will give the same worst case.
C. In order to get to that leaf we must have salary >= 150, or we go the wrong way at the root. Then we have to have age < 47, or we go the wrong way at the right child of the root. At the third node on the path we need salary < 300.
C. Looking at element A tells us T2 precedes T1 because of the order of r2(A) and w1(A). B tells us T3 precedes T2, and C tells us T2 precedes T1 again.
A. Element A tells us T1 precedes T2. B yields no constraint, since it is only read. C tells us T3 precedes both T1 and T2.
E. (I) could happen, if some host other than Z sent don't-commit to the coordinator, which sent abort to Y. (II) cannot happen; the coordinator cannot send both commit and abort to different sites. (III) could happen, if the coordinator's abort decision reached Z but not Y, even though it was Y's idea to abort.
A. The size estimate for R1 = AB JOIN BC is 100*200/50 = 400. We may assume that V(C,R1) = V(C,BC) = 40. Then, the size estimate for R2 = R1 JOIN CD is 400*300/max(40,60) = 2000, and V(D,R2) = V(D,CD) = 100. Finally, the size estimate for the entire expression is 2000*400/100 = 8000.
E. If we start by joining AB with BC, then we are forced to join CD next and then DE. Since AB JOIN BC and BC JOIN AB are defined to be different expressions, there are two left-deep trees of this type. Similarly, there are two left-deep trees that begin with CD and DE, and are forced to follow with BC and then AB. Finally, if we start by joining BC and CD (in either order) then we have a choice of which of AB and DE to join next, yielding four more left-deep trees, for a total of 8.
D. Try them and see. Yes, I know this is a bit of work.
D. We need 200 blocks for the data file (1000 tuples, 5 tuples per block). A sparse index will have one pointer for each of these 200 blocks. In the leaves you have 10 pointers to data, and one pointer to the next leaf. Thus, we need 20 leaf blocks in the B-tree. At the next level, we can point from each node to 11 leaves, but we still need 2 nodes at the second level. Thus, we also need a root node pointing to the two second-level blocks, for a total of 200 + 20 + 2 + 1 = 223 blocks.
B. The 1,000,000 pointers in the bucket array (matrix) can be stored 1024/8 = 128 to a block, thus requiring 1,000,000/128 = 7813 blocks. Each of the indexes has 999 integers. An index block needs 4 bytes to hold an integer that is the lowest stripe it represents, and thus has room for at most 255 integers that represent values of the grid lines. 999 grid lines can fit in four blocks, but not three, so both indexes require 4 blocks, for a total of 7813 + 4 + 4 =7821.
B. A one-block second level index for the index in each dimension lets us look up the stripe in which any x- or y-value falls with two disk I/O's. Thus, 4 disk I/O's gives us the bucket number (a pair of numbers in the range 0-999), from which we can compute the block of the bucket array that holds a pointer to the desired bucket. The total disk I/O's is thus 5.
A. The only explanation that meets all the facts is if the first entry represents A being changed from 0 to 1 and the second represents B being changed from 0 to 1. The fact that 0 appears in the first entry means that the log must be undo, and the second entry must be <T,B,0>.
C. When we try to execute w1(B), we fail because T1 has an earlier timestamp than T2, and yet T2 already read B. Thus, T1 aborts. There is no impediment to T2 writing A, so it commits.
E. Without the ???, the only requirement is from DB element A, which says that T1 precedes T2. If we replace ??? by r2(C), then with w1(C) we get the additional constraint T2 precedes T1, which creates a cycle. None of the other substitutions force T2 to precede T1.
C. Under the wait-die strategy, T1 waits for the lock on B, which T2 holds, and T2 waits for the lock on C, which T3 holds. However, when T3 asks for the lock on A, it dies because T1, an earlier transaction, holds that lock. Thus, T3 gives up its lock on C, which goes to T2. Then T2 completes and gives up its locks on B and C. T1 gets the lock on B and completes. Then, T3 gets a chance to restart and completes.
B. Under wound-wait, when T1 requests its lock on B, it wounds T2, which gives up its lock on B and waits to restart at the end of the sequence. That is, the order in which future requests will be processed is L1(B), L3(A), L2(B), L2(C). T1 gets the lock on B, completes, and gives up its locks on A and B. When T3 asks for a lock on A, it gets in and completes, giving up its locks. Now T2 gets its locks and completes.
C. The three writes have to appear in the order shown, so let's start with the sequence w1(B), w2(B), w3(B), and see where the other three actions could fit in. First, r1(A) must appear before w1(B) because we cannot reorder actions of a transaction. Thus, there is only one choice so far: r1(A), w1(B), w2(B), w3(B). Now, r2(A) can appear in any of the points before w2(B), so there are three choices of where to put r2(A). Finally, regardless of where we put r2(A), there are five choices of where to put r3(A) --- anywhere ahead of w3(B). Thus, there are 15 possible schedules.
E. The analysis is exactly the same as for Question 19, except that for view-serializability does not constrain the order of w1(B) and w2(B), because they are useless writes. Thus, there will be twice as many possible orders, or 30.
E. If we examine the schedule, we find that T3 precedes T2 because of B, and T1 precedes T3 because of C. Thus, if all transactions are to succeed, the timestamps better imply the serial order T1, T3, T2, which only choice (E) does.
C. T1 validates, because there are no previously validated transactions to compare with. When T3 tries to validate, T1 has not yet finished, so we must compare the write-set of T1, which is {A}, against both the read- and write-sets of T3, neither of which contain A. Thus, T3 validates. When T2 tries to validate, we find that T1 is finished, but T1 finished after T2 started, so we must compare the write-set {A} of T1 with the read-set {B,C} of T2. They are disjoint, so we are OK so far. Since T3 is validated but not finished, we must compare the write-set {B} of T3 with both the read- and write-sets of T2. Since the former contains B, T2 cannot validate.
D. The parity checks of disks 9 and 10 can be represented by the matrix

1 0 1 0 1 0 1 0 1 0 0

0 1 0 1 0 1 0 1 0 1 0

Before we add a third row representing the parity check of disk 11, we can recover from two disk failures if and only if the two disks have different columns (except for disk 11, which has only 0's in its column). We'll surely add a 1 in column 11. So far, there are five columns with 1,0 and five columns with 0,1. The best we can do is break each of these groups 3-and-2, to maximize the number of pairs of disks that no longer have identical columns. The only choice that does so is (D).
A. In step (i) we break S into n_S/B sorted sublists. Step (ii) requires that we use n_T buffers to hold T. Step (iii) needs an additional one buffer for R, and n_S/B buffers for the sorted sublists of S. Thus, 1 + n_T + n_S/B <= B. If we neglect the 1 and multiply through by B, we get (A).
E. (I) is true, because evidently T2 read the value of A written by T1, although T1 had not committed, and may never commit. (II) and (III) are both true because a redo/undo logging policy never requires that anything be written onto disk, even if the transaction that wrote it has committed.
C. We undo uncommitted transactions (from the right) first, and then redo committed transactions (from the left). In this case, all that matters is that T2 is the last (and only) to be redone, leaving A with the value 2. Notice that, because of the dirty read, the value A=2 might in some sense be bogus, if T2 depended on the value of A written by T1 (e.g., if T2's job was to increment A by 1, rather than, say, to read A, print it, and set A to 2 regardless of what it read).
A. If d <= n - x, then it is possible to get x local locks at live hosts. Since s <= x, it is also possible to get s local locks at live hosts. All the other possibilities have counterexamples. For example, consider n=10, s=3, and x=8. d better not exceed 2, or we cannot get 8 local x-locks. However, (B) through (E) put bounds of 3, 5, 7, and 8, respectively, on d.
A. First, consider what happens at the leaves. Every time we have to split, we wind up with two leaves with two keys each, so the leaves look like (1,2), (3,4), (5,6),... At the next level, each node will be split in a way that leaves a node of 3 pointers that never grows (because all insertions are at the right end. Thus, we can think of the sequence of pointer occupancies at the second level as 3,3,...,3,2. In fact that is the pattern at every level. At the fourth (root) level, we'll just have a node with 2 children. These two children will have among them five children (3,2). Those five children will have among them 14 leaf children (3,3,3,3,2). To fill 14 leaves with two keys each requires 28 keys.
C. (I) is possible. Let T1 request a lock in mode Z and then T2 request a lock on the same element in mode X. (II) is also possible. Let a transaction get a lock in mode Y, and then several other transactions get locks on the same element in mode Z. However, (III) is impossible. To see why, ask which mode is granted last. It cannot be X, because a prior Y forbids X. It cannot be Y because of a prior Z, nor can it be Z because of a prior X.
B. The smallest n we can choose is T/M, or else we do not have enough main memory to store all tuples. Since (n-1)/n of the tuples need to be shipped, the needed communication is T(n-1)/n. Substituting T/M for n and simplifying gives T-M. Note that increasing n above T/M increases the communication.
E. We must start with either sl_1(A) or ul_2(A). If the latter, then T1 cannot start until T2 completes, so there is one possible order. However, if we start with sl_1(A), then we only know that xl_2(A) cannot come until T1 releases its locks, i.e., at the end. However, ul_2((A) can come at any of three places, i.e., after any of the three actions of T1. Thus, there are 4 orders.
D. If we start with ul_2(A), then we still must follow immediately with xl_2(A), because T2 cannot give up the lock on A, and T1 cannot begin without it. However, if we start with sl_1(A), then the two actions of T2 can mesh (in order) with the remaining two actions of T2 in any way, i.e., there are 4-choose-2 = 6 more orders for a total of 7.