CS245  Winter 2000

Assignment 3
due in class on Thursday January 27

PROBLEM 1

Consider an indexed sequential file consisting of 1,000,000
blocks. Each block contains ten fixed size records.
For each key value found in the file, there are at most 3
records that share that value.

For this problem, assume that:
   Pointers to blocks are 4 bytes long
   Pointers to records are 5 bytes long (block pointer plus 1 byte offset)
   Index blocks are 4KB
   There is no spanning of records in the file.
   There is no spanning of "records" in the index, e.g., if a
     [key, pointer] does not fit in an index block, 
     it must move to another block.
   Search keys for file records are 8 bytes long.

(a) Assume that the file and index blocks are stored contiguously on
disk. How large (in blocks) would a sparse one-level, primary
sequential index be?  Design the index so it would use the minimum
amount of space.

(b) Assume that the file and index blocks are *not* contigous on
disk. How large (in blocks) would a sparse one-level, primary
sequential index be?  Design the index so it would use the minimum
amount of space.

(c) Suppose you now construct a second level sequential index
on the index of part (b)? How large would it be, in blocks?
Also assume you want to minimize space (but no spanning).

(d) Next you construct a one level, dense secondary index.
Compute its minimum size (in blocks). The secondary keys
are also 8 bytes, and index blocks are contiguous.
Also assume that no buckets are used; if multiple values of a secondary
key occur in the file, the keys are replicated in the index.
(We do not expect many duplicates, so it is not worth optimizing for them.)

(e) Suppose you construct a second level sequential index
for the index of part (d). How large would it be in blocks?
Also assume you want to minimize space (but no spanning).


PROBLEM 2

Consider a B+ tree of order 4 for this problem, that is initially empty.

(a) You insert the key 1 (and a pointer to this record) into the tree.
Show what the tree looks like at this point.

(b) Next you insert keys 2, 3, 4, and 5 (in this order) into the tree.
Show what the tree looks like at this point.

(c) Next you insert keys 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, and 17
(in this order).  Show what the tree looks like at this point.


PROBLEM 3

(a) What is the minimum number of record pointers that a B+ tree of
order n can contain, when it has two levels (root plus one more)?
(All record pointers are found in the second level.)

(b) What is the minimum number of record pointers that a B+ tree of
order n can contain, when it has three levels (root plus two more)?
(The third level contains the leaf nodes in this case.)

(c) What is the general expression for the minimum number of record
pointers an order n  B+ tree can contain, when it has j levels?

(d) If we are told that an order n B+ tree points to r records,
then what is the maximum number of levels, j, it may have?
That is, derive an expression of the form
           j  <=  f(n, r).
Note that this is a bound on the number of B+ tree nodes we need to
examine for looking up a record (given its key value),
when the file we are indexing contains r records.

PROBLEM 4

Consider an index organized as a B+ tree.  The leaf nodes contain
pointers to a total of N records, and each block that makes up the
index has m pointers.  We wish to choose the value of m that will
minimize search times on a particular disk device.

We are given the following information:

(a) For the disk that will hold the index, the time to read a given
block into memory can be approximated by (30 + .01*m) milliseconds.
The 30 milliseconds represent the seek and latency components of the
read, the .01*m milliseconds is the transfer time.  (That is, as m
becomes larger, the larger the block will be and the more time it will
take to read it into memory.)

(b) Once the block is in memory a binary search is used to find the
right pointer.  So the time to process a block in main memory is
(a + b log_2 m) milliseconds, for some constants a , b.
("log_2 m" means log base 2 of m.)

(c) The main memory time constant "a" is much smaller than the disk
seek and latency time of 30 milliseconds.

(d) The index is full, so that the number of blocks that must be
examined per search is log_m N.

Questions:

(i) What value of m minimizes the time to search for a given record?
An approximate answer is OK.  (HINT: If you come up with an equation
which is hard to solve algebraically, try plugging in values to locate
the root of the equation.)  The value you obtain should be independent
of "b"!

(ii) What happens as the seek and latency constant (30ms) decreases?
For instance, if this constant is cut in half, how does the optimum m
value change?


END OF HOMEWORK 3