CS245A LAB #2
Due by 5pm, Tuesday, February 17, 1998.
100pts
Your second programming lab is to improve significantly the functionality of the database engine that you built in the first lab. In particular, you need to add indices, nested-loop join, and a variant of nested-loop join that uses indices.

--------------------

Header Files

The database engine interface should be specified in the header file dbe2.h, which you will later ``#include'' in the main program. The dbe2.h should export the following functions:

// these functions should remain the same as in your lab1

int CreateDB( char* dbName, char* datafile );

int PrintDB( char* dbName );

int RecordScan( char* dbName, int attrNumber, char* value );

int KeyScan( char* dbName, char* value );

// you need to add any new error codes you defined

void error( int rc );

// New DBE functions: these are the functions you need to implement

int CreateIndex( char* dbName, char* indexName, int attrNum );

int IndexScan( char* dbName, char* indexName, char* value );

int NestedLoopJoin( char* dbName1, char* dbName2, int attrNum1, int attrNum2 );

int NLIndexJoin( char* dbName1, char* dbName2, char* index, int attrNum );

As in the first lab, all functions that return an int, return a ``0'' if the operation was successful or error codes (defined by you) in any other case.

--------------------

New DBE Functions

int CreateIndex( char* dbName, char* indexName, int attrNum );

This function creates an index named indexName for the attribute in position attrNum of the database dbName. The index will be an ndbm database, which consist of key/value pairs, where the keys will be all the possible values of the indexed attribute and the values will be the keys of the records that have those values. For example, for the database at the left, the call CreateIndex( ``mydb'', ``myindex'', 2 ), will create the index at the right:

mydb

myindex

Key

Data

champs

< champs, Denver Broncos, AFC >

loser

< loser, Green Bay Packers, NFC >

best

< best, San Francisco 49ers, NFC >

worst

< worst, Dallas Cowboys, NFC >

Key

Data

NFC

< best, loser, worst >

AFC

< champs >

As in the first lab, you need to place several attributes in a single field (try to reuse your code from the first assignment). Also, as in programming lab 1, you need to store some metadata in the index (note that the IndexScan function doesn't take the number of the attribute being scanned). It is an error to try to create an index for the first attribute, as you already have a primary index for the primary key.

Note: You can assume that all primary keys of records with the same indexed-attribute value fit in a single ndbm record.

int IndexScan( char* dbName, char* indexName, char* value );

This function should print all keys and records of the database dbName for which the attribute indexed by indexName has the same value as the null-terminated string pointed by value. If indexName is NULL, then the scan is on the first attribute and you need to use your primary index.

int NestedLoopJoin( char* dbName1, char* dbName2, int attrNum1, int attrNum2 );

This functions prints the result of the equi-join of the databases dbName1 and dbName2 on the attributes that are in position attrNum1 and attrNum2. The join is computed using the nested-loop join algorithm as explained in class.

int NLIndexJoin( char* dbName1, char* dbName2, char* index, int attrNum );

This functions prints the result of the equi-join of the databases dbName1 and dbName2 on the attribute of dbName1 indexed by index and the attribute of dbName2 in position attrNum. The join is computed using a variant of the nested-loop join algorithm where the outer loop of NLJ is over the nonindexed database and an index scan is used instead of an inner loop. If the value of index is NULL, this means, that you are doing a join on the first attribute of database dbName1 and you must use its primary index.

--------------------

Sample Application

Following is an application that uses your databases engine to create a database from the ``football_teams'' data file, a database from the ``plays_for'' data file and then joins both databases.

#include "dbe2.h"

main() {

int rc;

if ((rc=CreateDB( "football_teams", "football_teams.txt" ))) { error(rc); }

if ((rc=CreateDB( "plays_for", "plays_for.txt" ))) { error(rc); }

if ((rc=CreateIndex( "plays_for", "aidx", 1 ))) { error(rc); }

if ((rc=NLIndexJoin( "plays_for", "football_teams", "aidx", 0 ))) { error(rc); }

if ((rc=NLIndexJoin( "football_teams", "plays_for", NULL, 1 ))) { error(rc); }

}

If the data files are:

data files

football_teams.txt

plays_for.txt

6&20&3

5&6&15&10&2

champs&Denver Broncos&AFC

gregc&best&Greg Clark&Stanford&TE

best&San Francisco 49ers&NFC

johne&champs&John Elway&Stanford&QB

worst&Dallas Cowboys&NFC

edmc&champs&Ed McCaffrey&Stanford&WR

loser&Green Bay Packers&NFC

darrg&champs&Darrien Gordon&Stanford&CB

The output of executing this program should contain:

johne champs John Elway      Stanford  QB champs  Denver Broncos       AFC

edmc champs Ed McCaffrey Stanford WR champs Denver Broncos AFC

darrg champs Darrien Gordon Stanford CB champs Denver Broncos AFC

gregc best Greg Clark Stanford TE best San Francisco 49ers NFC

and also:

champs  Denver Broncos       AFC johne champs John Elway      Stanford  QB

champs Denver Broncos AFC edmc champs Ed McCaffrey Stanford WR

champs Denver Broncos AFC darrg champs Darrien Gordon Stanford CB

best San Francisco 49ers NFC gregc best Greg Clark Stanford TE

--------------------

Documentation, Grading and Submission

Each student is expected to submit a 1 page description of your design and algorithms in a plain text file called dbe2.doc, and to include comments in the code. The grade of your program will be based on three aspects:

Submission of the assignment will be done using the same procedure as in the programming lab 1. Detailed instructions for the submission process will be provided in the class web page.