CS245A Solutions to
Midterm |

- E.
10*100*18*512 = 9,216,000 bytes.
- D.
Since there are 18 sectors on a track, each track plus a gap takes up 20
degrees of arc.
Since 20% is gap, the gaps are 4 degrees and the sectors 16 degrees.
A block is read after we scan two sectors and the intervening gap, or
36 degrees, or 1/10 of a circle.
Since the disk rotates once in 10 milliseconds, it takes exactly 1
millisecond to read a block.
- C.
Deducting the header, there are 3996 bytes for records and offset table.
Since a record takes 100 bytes and the offset 4 bytes, one record will
consume 104 bytes.
3996/104 = 38 plus a remainder.
- B.
Bit 3 is a copy of bit 1; i.e., it is 0.
Bit 4 is the modulo-2 sum of bits 2 and 3, or 1.
- D.
The parity-check matrix is
Since columns 2 and 4 are identical, we shall not be able to figure out the content of these two disks, should they fail simultaneously.1 0 1 0 0 1 1 1 - D.
A key-pointer pair occupies 20 bytes, so we can store 50 of them in a
block.
A dense index requires one pair per record, or 10,000 pairs.
These occupy 200 blocks.
- B. On the other hand, a sparse index only requires one pair per block, or 1000 pairs. These fit in 20 blocks.
- A.
The first four records all go in the bucket for 0, overflowing it.
When we set i=2, we find that all four records go in the bucket for 00,
so we need to split again, setting i=3.
Now the four records divide, with 0000 and 0001 going in the bucket for
000 and 0010 and 0011 going to the bucket for 001.
- B.
From that point, there are only two records for each of the eight
buckets, so each bucket gets one block, a total of 8 blocks.
- A.
This question was missed by far more than any other question.
What people tended to forget was that it is OK if the root has only two
children, no matter how big the nodes/blocks are.
Thus, there could be a 2-way split at the root, and a minimum 25-way
split at each of the two children of the root.
There could thus be as few as 50 leaves.
Each leaf could have as few as 25 pointers, or a total of 1250 pointers
to records.
- E.
There are 1000 records in each bucket.
In scheme (A), a bucket requires 100 blocks.
If we search down the bucket we shall have to visit half the buckets
on the average, so
the number of disk I/O's is 50.
In scheme (B), buckets require only 20 blocks, so an average search
looks at half of them, or 10 disk I/O's.
However, in scheme (B) we must then retrieve the block where the
desired record is found or a total of 11 I/O's.
The ratio is 50/11 or about 4.5.
- B.
What you had to realize was that if we read halves of cylinders, the
transfer time is by far the dominant cost, just as if we were reading
full cylinders.
Thus, doubling the memory will make only a small impact on time.
Likewise, speeding up the head motion will not help much, nor will using
a scheduling algorithm, since we're already using the disk about as well
as we can.
Since the job is said to be I/O bound, a faster processor won't help
much either.
However, speeding up the disk will reduce the transfer time and thus
make a big impact in the total time requirements.
- D.
The range query takes in parts of 5 stripes in each of the dimensions,
so the total number of buckets to be searched is 5*5 = 25.
- D.
The point (115,220) is at distance a little more than 15, but less than
20 from the target point (110,205).
Distance 15 is enough to get you from the target point to either of the
two lower corners of the region, and beyond it to the regions to the
east, west, south, southeast, and southwest (5 regions).
On the other hand, even distance 20 would not be enough to get you to
the upper border at y=250, so the other three neighbor regions need not
be searched.
- A.
The average number of buckets is the average of 2^(10-5), 2^(10-3), and
2^(10-2), or 138.7.
- E.
First, notice that R has 200*100 = 20,000 records.
Since each tuple of R joins with 5 tuples of S, the join has 100,000
records.
Now, we have to see how many tuples of the join fit in one block.
Recall that the theta-join produces tuples with all the attributes of
both relations; i.e., the schema of the join is (a,b,c,d), even though b
and c obviously have the same value.
Thus, a tuple of the join has the sum of the lengths of the tuples of R
and S.
Since a tuple of R takes 1/100 of a block, and a tuple of S takes 1/50
of a block, together they take 3/100 of a block.
we therefore require 100,000*3/100 or 3000 blocks.
Well that might not be exactly right, since we usually would not split
records, and there might be a small effect due to record headers,
but 3000 is by far the closest answer.
- A.
If we load R into N buffers, we can run the tuples of S through
one buffer.
Since all of R is in memory, it doesn't matter how many tuples of R one
tuple of S joins with; we can see them all when the tuple of S visits
main memory.
- B.
This expression is the only one that might have duplicates.
For example, if S has the tuple s twice, and R has the tuple r, then the
tuple rs will appear twice in the result of expression B and only once
in the results of the other expressions.
- C.
Since there is grouping on attribute
*a*, it doesn't matter whether the selection occurs before or after grouping. Thus, I and II are the same. However, eliminating duplicates before the grouping could cause the groups to have fewer tuples, and thus might reduce counts. Hence, III is different. - C. The estimate of the size of the join is 50*100/10 = 500. Note that we should divide by the smaller of V(R,b) and V(S,b), but these are the same (10), so it doesn't matter here. Then, we should account for the reduction due to the selection. Since V(R,a) = 5, 1/5th of the tuples survive, or 100 tuples.