Midterm solutions

CS245A Solutions to Midterm

E. 10*100*18*512 = 9,216,000 bytes.
D. Since there are 18 sectors on a track, each track plus a gap takes up 20 degrees of arc. Since 20% is gap, the gaps are 4 degrees and the sectors 16 degrees. A block is read after we scan two sectors and the intervening gap, or 36 degrees, or 1/10 of a circle. Since the disk rotates once in 10 milliseconds, it takes exactly 1 millisecond to read a block.
C. Deducting the header, there are 3996 bytes for records and offset table. Since a record takes 100 bytes and the offset 4 bytes, one record will consume 104 bytes. 3996/104 = 38 plus a remainder.
B. Bit 3 is a copy of bit 1; i.e., it is 0. Bit 4 is the modulo-2 sum of bits 2 and 3, or 1.
D. The parity-check matrix is
1 0 1 0
0 1 1 1
Since columns 2 and 4 are identical, we shall not be able to figure out the content of these two disks, should they fail simultaneously.
D. A key-pointer pair occupies 20 bytes, so we can store 50 of them in a block. A dense index requires one pair per record, or 10,000 pairs. These occupy 200 blocks.
B. On the other hand, a sparse index only requires one pair per block, or 1000 pairs. These fit in 20 blocks.
A. The first four records all go in the bucket for 0, overflowing it. When we set i=2, we find that all four records go in the bucket for 00, so we need to split again, setting i=3. Now the four records divide, with 0000 and 0001 going in the bucket for 000 and 0010 and 0011 going to the bucket for 001.
B. From that point, there are only two records for each of the eight buckets, so each bucket gets one block, a total of 8 blocks.
A. This question was missed by far more than any other question. What people tended to forget was that it is OK if the root has only two children, no matter how big the nodes/blocks are. Thus, there could be a 2-way split at the root, and a minimum 25-way split at each of the two children of the root. There could thus be as few as 50 leaves. Each leaf could have as few as 25 pointers, or a total of 1250 pointers to records.
E. There are 1000 records in each bucket. In scheme (A), a bucket requires 100 blocks. If we search down the bucket we shall have to visit half the buckets on the average, so the number of disk I/O's is 50. In scheme (B), buckets require only 20 blocks, so an average search looks at half of them, or 10 disk I/O's. However, in scheme (B) we must then retrieve the block where the desired record is found or a total of 11 I/O's. The ratio is 50/11 or about 4.5.
B. What you had to realize was that if we read halves of cylinders, the transfer time is by far the dominant cost, just as if we were reading full cylinders. Thus, doubling the memory will make only a small impact on time. Likewise, speeding up the head motion will not help much, nor will using a scheduling algorithm, since we're already using the disk about as well as we can. Since the job is said to be I/O bound, a faster processor won't help much either. However, speeding up the disk will reduce the transfer time and thus make a big impact in the total time requirements.
D. The range query takes in parts of 5 stripes in each of the dimensions, so the total number of buckets to be searched is 5*5 = 25.
D. The point (115,220) is at distance a little more than 15, but less than 20 from the target point (110,205). Distance 15 is enough to get you from the target point to either of the two lower corners of the region, and beyond it to the regions to the east, west, south, southeast, and southwest (5 regions). On the other hand, even distance 20 would not be enough to get you to the upper border at y=250, so the other three neighbor regions need not be searched.
A. The average number of buckets is the average of 2^(10-5), 2^(10-3), and 2^(10-2), or 138.7.
E. First, notice that R has 200*100 = 20,000 records. Since each tuple of R joins with 5 tuples of S, the join has 100,000 records. Now, we have to see how many tuples of the join fit in one block. Recall that the theta-join produces tuples with all the attributes of both relations; i.e., the schema of the join is (a,b,c,d), even though b and c obviously have the same value. Thus, a tuple of the join has the sum of the lengths of the tuples of R and S. Since a tuple of R takes 1/100 of a block, and a tuple of S takes 1/50 of a block, together they take 3/100 of a block. we therefore require 100,000*3/100 or 3000 blocks. Well that might not be exactly right, since we usually would not split records, and there might be a small effect due to record headers, but 3000 is by far the closest answer.
A. If we load R into N buffers, we can run the tuples of S through one buffer. Since all of R is in memory, it doesn't matter how many tuples of R one tuple of S joins with; we can see them all when the tuple of S visits main memory.
B. This expression is the only one that might have duplicates. For example, if S has the tuple s twice, and R has the tuple r, then the tuple rs will appear twice in the result of expression B and only once in the results of the other expressions.
C. Since there is grouping on attribute a, it doesn't matter whether the selection occurs before or after grouping. Thus, I and II are the same. However, eliminating duplicates before the grouping could cause the groups to have fewer tuples, and thus might reduce counts. Hence, III is different.
C. The estimate of the size of the join is 50*100/10 = 500. Note that we should divide by the smaller of V(R,b) and V(S,b), but these are the same (10), so it doesn't matter here. Then, we should account for the reduction due to the selection. Since V(R,a) = 5, 1/5th of the tuples survive, or 100 tuples.