CS245  Winter 1999

Assignment 3
due in class on Thursday January 28

PROBLEM 1

Consider an indexed sequential file consisting of 1,000,000
contiguous blocks. Each block contains ten fixed size records.
For each key value found in the file, there are at most 4
records that share that value.

(a) How large (in blocks) would a sparse one-level, primary sequential
index be?  Assume blocks are 4 KB, block pointers are 4 bytes, a key
is 5 bytes, and that the records of the index are contiguous.
Design the index so it would use the minimum amount of space;
however, assume that keys are not spanned across blocks.

(b) Suppose you now construct a second level sequential index
on the index of part (a)? How large would it be, in blocks?
Also assume you want to minimize space (but no spanning).

(c) Next you construct a one level, dense secondary index.
Compute its minimum size (in blocks), assuming that record pointers take
5 bytes, block pointers take 4 bytes, the secondary keys are also 5 bytes
long, blocks are 4 KB, and index blocks are contiguous.
Also assume that no buckets are used; if multiple values of a secondary
key occur in the file, the keys are replicated in the index.
(We do not expect many duplicates, so it is not worth optimizing for them.)
Assume that [key, pointer] pairs are not spanned across blocks.

(d) Suppose you construct a second level sequential index
for the index of part (c). How large would it be in blocks?
Also assume you want to minimize space (but no spanning).


PROBLEM 2

Consider a B+ tree of order 4 for this problem, that is initially empty.

(a) You insert the key 1 (and a pointer to this record) into the tree.
Show what the tree looks like at this point.

(b) Next you insert keys 2, 3, 4, and 5 (in this order) into the tree.
Show what the tree looks like at this point.

(c) Next you insert keys 6, 7, 8, 9, 10, 11, 12, 13, and 14 (in this order).
Show what the tree looks like at this point.


PROBLEM 3

(a) What is the minimum number of record pointers that a B+ tree of
order n can contain, when it has two levels (root plus one more)?
(All record pointers are found in the second level.)

(b) What is the minimum number of record pointers that a B+ tree of
order n can contain, when it has three levels (root plus two more)?
(The third level contains the leaf nodes in this case.)

(c) What is the general expression for the minimum number of record
pointers an order n  B+ tree can contain, when it has j levels?

(d) If we are told that an order n B+ tree points to r records,
then what is the maximum number of levels, j, it may have?
That is, derive an expression of the form
           j  <=  f(n, r).
Note that this is a bound on the number of B+ tree nodes we need to
examine for looking up a record (given its key value),
when the file we are indexing contains r records.

PROBLEM 4

Consider an index organized as a B+ tree.  The leaf nodes contain
pointers to a total of N records, and each block that makes up the
index has m pointers.  We wish to choose the value of m that will
minimize search times on a particular disk device.

We are given the following information:

(a) For the disk that will hold the index, the time to read a given
block into memory can be approximated by (70 + .05*m) milliseconds.
The 70 milliseconds represent the seek and latency components of the
read, the .05*m milliseconds is the transfer time.  (That is, as m
becomes larger, the larger the block will be and the more time it will
take to read it into memory.)

(b) Once the block is in memory a binary search is used to find the
right pointer.  So the time to process a block in main memory is
(a + b log_2 m) milliseconds, for some constants a , b.
("log_2 m" means log base 2 of m.)

(c) The main memory time constant "a" is much smaller than the disk
seek and latency time of 70 milliseconds.

(d) The index is full, so that the number of blocks that must be
examined per search is log_m N.

Questions:

(i) What value of m minimizes the time to search for a given record?
An approximate answer is OK.  (HINT: If you come up with an equation
which is hard to solve algebraically, try plugging in values to locate
the root of the equation.)  The value you obtain should be independent
of "b"!

(ii) What happens as the seek and latency constant (70ms) decreases?
For instance, if this constant is cut in half, how does the optimum m
value change?


END OF HOMEWORK 3