CS145 Assignment #4

Due Wednesday, Oct. 29, 1997

Step 4 of Your PDA

This week we shall write and run some SQL queries. It is suggested that you begin to follow the routine of loading your database from your load file, running queries or other commands from a file that contains those SQL statements, and then deleting your data so it doesn't clutter up the database all week and (worse) you don't forget and load the same tuples several times into your relations. Remember that SQL thinks of relations as bags, and so will happily let you insert the same tuple as many times as you ask it to. To clean out a relation R without deleting the schema itself, use command DELETE FROM R;.

(10 pts.) Reconsider your relation schema in the light of the theory of normalization and BCNF. Remind us of your chosen database schema. For each of your current relations, tell whether its relation schema is in BCNF. If not, then either redesign your schema so the relation is in BCNF, or give a rationale for leaving in in non-BCNF form (e.g., the amount of redundancy introduced is minimal, and splitting the relation would cause some reasonable queries to become multirelational). Indicate your final choice of design, whether or not you choose to decompose one or more relations.
(25 pts.) Write five queries on your PDA database, using the select-from-where construct of SQL. To receive full credit, all but perhaps one of your queries should exhibit some interesting feature of SQL: queries over more than one relation, or subqueries, for example. We suggest that you experiment with your SQL commands on a small database (e.g., your hand-created database), before running them on the large database that you loaded in PDA part 3. Initial debugging is much easier when you're operating on small amounts of data. Once you're confident that your queries are working, run them on your complete database. If you discover that most or all of your ``interesting'' queries return an empty answer on your large database, check whether you followed the instructions in Assignment #3 for generating data values that join properly. You will need to modify your data generator accordingly. Turn in a copy of all of your SQL queries, along with a script illustrating their execution. Your script should be sufficient to convince us that your commands run successfully. Please do not, however, turn in query results that are thousands (or hundreds of thousands) of lines long!

(25 pts.) In part (2) you probably discovered that some queries run very slowly over your large database. An important technique for improving the performance of queries is to create indexes. An index on an attribute A of relation R allows the database to quickly find all tuples in R with a given value for attribute A. This index is useful if a value of A is specified by your query (in the where-clause). It may also be useful if A is involved in a join that equates it to some other attribute. For example, in the query
```
     SELECT Bars.address
     FROM Drinkers, Bars
     WHERE   Drinkers.name = 'joe'
         AND Drinkers.frequents = Bars.name;
```
we might use an index on Drinkers.name to help us find the tuple for drinker Joe quickly. We might also like an index on Bars.name, so we can take all the bars Joe frequents and quickly find the tuples for those bars to read their addresses.
In Oracle, you can get an index by the command:
```
     CREATE INDEX <IndexName> ON <RelName>(<Attribute List>)
         TABLESPACE csindx;
```
Note:
- The text on the second line (``tablespace csindx'') is unique to the Stanford Oracle installation and is designed to get our indexes onto a second disk used only for that purpose, so one disk can be retrieving data while the other is using an index.
- Oracle creates indexes automatically when you declare an attribute(s) PRIMARY KEY or Unique. Thus, some indexes may exist before you create them and may also be making certain queries run faster than than they would with no indexes at all. Be sure to take this factor into account when trying to explain differences in running times.
If the attribute list contains more than one attribute, then the index requires values for all the listed attributes to find a tuple. That situation might be helpful if the attributes together form a key, for example. An illustration of the CREATE INDEX command is
```
     CREATE INDEX DrinkerInd ON Drinkers(name)
         TABLESPACE csindx;
     CREATE INDEX BarInd ON Bars(name)
         TABLESPACE csindx;
```
which creates the two indexes mentioned above. To get rid of an index, you can say DROP INDEX followed by the name of the index. Notice that each index must have a name, even though we only refer to the name if we want to drop the index.
Create at least two useful indexes for your PDA. Run your queries from part (1) on your large database with the indexes and without the indexes. To time your commands, you may issue the following commands to sqlplus:
1. TIMING START <TimerName>; starts your timer. Give it whatever name you wish.
2. TIMING SHOW; prints the current wall-clock time of your current timer.
3. TIMING STOP; prints the current time of your timer and stops it.
Naturally these times may be affected by external factors such as system load, etc. Still, you should see a dramatic difference between the execution times with indexes and the times without. Turn in a script showing your commands to create indexes, and showing the relative times of query execution with and without indexes.

Problem Set

We will be working with the familiar Books database schema in the following six questions.

Author (firstName, lastName, address)
Book (title, copyright, firstName, lastName)
Edition (isbn, pubDate, listPrice, publisher, title, copyright)
Bookstore (name, address)
Stock (storeName, storePrice, isbn)

The Author relation give the author's first and last names and address. Each first name - last name pair is unique. The Book relation gives the title, copyright date, and the first and last names of the author. A book can have more than one author. The Edition relation give the ISBN number, publicatin date, list price recommended by the publisher, publisher name, and the title and copyright of the book the edition is of. Each edition has a unique ISBN number. The Bookstore relation gives the name and address of a book. The Stock relation give the bookstore name, the price, and the ISBN number of the edition. listPrice and storePrice are real numbers. copyright and pubDate are integers (representing the year). All other attributes are strings.

(5 pts.) Write an expression of relational algebra to answer the following query: Find all authors who have written books that have been published after 1990.
(5 pts.) Write an expression of relational algebra to answer the following query: Find all authors who have written more than one book.
(10 pts.) Write an expression of relational algebra to answer the following query: Find pairs of editions (represented by the ISBN numbers) that are of the same book. A pair should be listed only once; e.g., list (i,j) but not (j,i).
(10 pts.) Write an expression of relational algebra to answer the following query: Find the bookstore that sells the cheapest copy of "Hamlet".
(5 pts.) Write the following query in SQL: Find the publishers who have published books by Grace Paley.
(5 pts.) Write the following query in SQL: Find the most recent edition of "The Hobbit".