 Database Systems: The Complete
Book
 Solutions for Chapter 11 

Solutions for Section 11.2
Solutions for Section 11.3
Solutions for Section 11.4
Solutions for Section 11.5
Solutions for Section 11.6
Solutions for Section 11.7
Solutions for Section 11.1
Exercise 11.2.1(a)
Terabyte disks are 25 times as large as ``2001'' disks.
25 is a little less than 2^5,
so there will have to be about five doublings of disk
size. According to Moore's law, that will take 7.5 years, so terabyte
disks should be available in computer stores around 2009.
Return to Top
Solutions for Section 11.3
Exercise 11.3.1(a)
The disk has 10 * 10,000 = 100,000 tracks.
The average track has 1000 * 512 = 512,000 bytes.
Thus, the capacity is 51.2 gigabytes.
Exercise 11.3.1(c)
The maximum seek time occurs when the heads have to move across all the
tracks.
Thus, substitute 10,000 (really 9999) for n in the formula
1+.001n to get 11 milliseconds.
Exercise 11.3.1(d)
The maximum rotational latency is one full revolution.
Since the disk rotates at 10,000 rpm, it takes 1/10000 of a minute, or
6 milliseconds.
Exercise 11.3.3
First, we need to estabish the probability density function for the
position of a random sector.
Since these vary with the radius, the density at the inner edge is 3/7
the density of the outer edge, so the probability density looks like a
trapezoid:
*
* *
* *
* * 7/5
* *
3/5 * *
* *
*********
Call this function p(x) = 3/5 + 4x/5,
with x varying from 0 to 1 and
representing the fraction of the way from the inner to the outer edge
that a sector is.
Our question asks for the average over all x and y distributed by
function p of the magnitude of xy, that is:
1 1
INT INT p(x)p(y)xy dx dy
0 0
Since the moves outward must equal the moves inward, we can eliminate
the absolutemagnitude operator by doubling, and restricting to the case
where x>y.
That is, we compute:
1 x
2 INT INT p(x)p(y)(xy) dx dy
0 0
If we substitute for p, we get
1 x
2 INT INT (3/5 + 4x/5)(3/5+4y/5)(xy) dx dy
0 0
Now, we have only to expand terms, integrate first on y, then on x, to
get the value of this integral: 121/375.
That is, the average sectortosector move of the heads will traverse
this fraction of the tracks, which is just slightly less than 1/3 of the
tracks.
Return to Top
Solutions for Section 11.4
Exercise 11.4.2(a)
The relation occupies 100,000 blocks, and the sort takes 4 disk I/O's
per block, or 400,000 disk I/O's.
If the number of tuples were doubled, the relation would occupy twice as
many blocks, so we would expect 800,000 disk I/O's.
We should check that the 2phase sort is still possible, but since
doubling the size of the relation would require 32 sublists rather than
16, there will still be plenty of room in memory on the second pass.
Exercise 11.4.2(c)
Doubling the size of blocks reduces the number of disk I/O's needed to
half, or 200,000.
We might imagine that we could keep growing the size of blocks
indefinitely, and thus reduce the number of disk I/O's to almost zero.
In a sense we can, but there are limitations, including the fact that
transfer time will eventually grow to dominate other aspects of latency,
in which case counting disk I/O's fails to be a good measure of running
time.
Exercise 11.4.4
Binary search requires probing the block in the middle of the 100,000
blocks, then the one in the middle of the first or second half, and so
on, for log_2(100,000) = 17
probes, until we can narrow down the presence of
the desired record on one block.
Thus, we require 17 disk I/O's.
Exercise 11.4.6
Look ahead to the solution to Exercise 11.4.8.
The formula derived there tells us that:

Doubling B halves n.
That is, the larger the block size, the smaller the relation we can sort
with the twophase, multiway merge sort!
Here is one good reason to keep block sizes modest.

If we double R, then we halve the number of tuples we can sort,
but that is no surprise, since the number of blocks occupied by the
relation would then remain constant.

If we double M we multiply by 4 the number of tuples we can
sort.
Exercise 11.4.8
The number of sorted sublists s we need is s = nR/M.
On the second phase we need one block for each sublist, plus one for the
merged list.
Thus, we require Bs < M.
Substituting for s, we get
nRB/M < M, or n < M^2/RB.
Return to Top
Solutions for Section 11.5
Exercise 11.5.2
Recall that the Megatron 747 has a transfer time (in milliseconds)
of 0.25, and an average rotational latency of 4.17.
If the average seek moves 1/3 of the way across half of the
16,384 tracks, the average
seek time for the Megatron 747 is 1 + .001*(16384/6) = 3.73.
Thus, the answer to part (a) is one block per 0.25 + 4.17 + 3.73 = 8.15
milliseconds, or 123 blocks per second, on each disk.
Thus, the system can read 246 blocks per second.
For a ``regular'' Megatron 747, the average latency is 0.25 + 4.17 +
6.46 =
10.88 milliseconds.
However, if we have two, mirrored disks, each can be handling a request
at the same time, or one read per 5.44 milliseconds.
That gives us 184 blocks per second as the answer to (b).
For part (c),
restricting each disk to half the tracks means that
if there is not always a queue of waiting requests (which slows the
applications that the disk is supporting), then there might be two
requests for the same half of the disk, in which case one disk is idle
and a request waits.
Exercise 11.5.3(a)
Suppose that there are n requests waiting on each pass in the
steady state.
Then the time taken for a pass of the elevator algorithm is:

0.25n for the transfers.

4.17n for the rotational latencies.

n + 16.38 for the seek times.
To see the latter, notice that the head must start n times, which
accounts for the first term, and total travel is 16,384 tracks, which
accounts for the second term, at the Megatron 747's rate of 1000 tracks
per millisecond.
Let the time taken for a pass be t.
Then 0.25n + 4.17n + n +16.38 = t.
Also, during time t, another n requests must arrive, and
they arrive one every A milliseconds.
Thus, t = An.
We can solve these two equations for n in terms of A; it
is n = 16.38/(A  5.42).
The answer we want is t, the time to perform the pass.
But t = An = 16.38A/(A  5.42).
Exercise 11.5.4
Think of the requests as a random sequence of the integers 1 through 4.
This sequence can be divided up into segments that do not contain two of
the same integer, in a greedy way.
For example, the sequence 123142342431 would be divided into 123, 1423,
42, and 31.
The disks are serving all the requests from one segment, and each
request is generated and starts at about the same time.
When a segment is finished, the waiting request begins, along with other
requests for other disks that are, by our assumption, generated almost
immediately and thus finish at about the same time.
The question is thus: if I choose numbers 1, 2, 3, and 4 at random, how
many choices, on the average, can I make without a repeat?
The cases are:

1/4 of the time we get an immediate repeat on the second choice,
and the length is therefore 1.

(3/4)(1/2) = 3/8 of the time we get our first repeat at third number,
and the length is 2.

(3/4)(1/2)(3/4) = 9/32 of the time we get our first repeat at the fourth
number, and our length is 3.

The remaining fraction of the time, 3/32, we get four different numbers,
and our length is 4.
The average length is therefore:
(1/4)*1 + (3/8)*2 + (9/32)*3 + (3/32)*4 = 71/32.
Return to Top
Solutions for Section 11.6
Exercise 11.6.1(a)
There are five 1's, which is odd, so the parity bit is 1.
Return to Top
Solutions for Section 11.7
Exercise 11.7.2
To compute the mean time to failure, we can compute the probability that
the system fails in a given year.
The MTTF will be the inverse of that probability.
Note that there are 8760 hours in a year.
 a)

The system fails if the second disk fails while the first is being
repaired.
The probabiliy of a failure in a year is 2F.
The probability that the second disk will fail during the H hours that
the other is being prepared is H/8760.
Thus, the probability of a failure in any year is 2FH/8760, and
the MTTF is 4380/FH
 b)

The system loses data if there are three disk crashes within an H
hour period.
The probability of any one disk failing in a year is NF.
If so, there are (N1)(N2)/2 pairs of other disks that could
fail within an H hour period, and the chance that any one of them
will do so is FH/8760.
Thus, the probability of two additional disk failures is
(N1)(N2)F^2H^2/(2*8760^2), or approximately
(NFH)^2/1.5*10^8.
The MTTF is the inverse of this probability.
Exercise 11.7.4(a)
01010110.
Exercise 11.7.5(a)
01010110.
Notice that this question calls for
the same computation as Exercise 11.7.4(a).
Exercise 11.7.8(a)
In the matrix of Fig. 11.17, we see that columns 1 and 7 differ in both
the first and second rows.
In each case, column 1 has bit 1 and column 7 has bit 0.
Let's use the first row to recover disk 1.
That step requires us to take the modulo2 sum of disks 2, 3, and 5, the
other disks where row 1 has bit 1.
Next, we use row 3, the only row where column 7 has a 1, to recover disk
7 by taking the modulo2 sum of disks 1, 3, and 4.
Exercise 11.7.12
The parity checks of disks 9 and 10 can be represented by the matrix
Before we add a third row representing the parity check of disk 11, we
can recover from two disk failures if and only if the two disks have
different columns (except for disk 11, which has only 0's in its
column).
We'll surely add a 1 in column 11.
So far, there are five columns with 1,0 (columns 1, 2, 3, 4, and 9)
and five columns with 0,1 (columns 5, 6, 7, 8, and 10).
The best we can do is break each of these groups 3and2, to maximize
the number of pairs of disks that no longer have identical columns.
Thus, we can make disk 11 be a parity check on any two or three
of the first group and any two or three of the second group.
An example would be to make disk 11 be a parity check on columns 3
through 6, but there are many other correct answers.
Return to Top