CS145 Assignment #4

Due Wednesday, Oct. 27, 1999

Step 4 of Your PDA

This week we shall write and run some SQL queries. It is suggested that you begin to follow the routine of loading your database from your load file, running queries or other commands from a file that contains those SQL statements, and then deleting your data so it doesn't clutter up the database all week and (worse) you don't forget and load the same tuples several times into your relations. Remember that SQL thinks of relations as bags, and so will happily let you insert the same tuple as many times as you ask it to. To clean out a relation R without deleting its schema, use command DELETE FROM R;.

(15 pts.) Write five queries on your PDA database, using the select-from-where construct of SQL. To receive full credit, all but perhaps one of your queries should exhibit some interesting feature of SQL: queries over more than one relation, or subqueries, for example. We suggest that you experiment with your SQL commands on a small database (e.g., your hand-created database), before running them on the large database that you loaded in PDA part 3. Initial debugging is much easier when you're operating on small amounts of data. Once you're confident that your queries are working, run them on your complete database. If you discover that most or all of your ``interesting'' queries return an empty answer on your large database, check whether you followed the instructions in Assignment #3 for generating data values that join properly. You will need to modify your data generator accordingly. Turn in a copy of all of your SQL queries, along with a script illustrating their execution. Your script should be sufficient to convince us that your commands run successfully. Please do not, however, turn in query results that are thousands (or hundreds of thousands) of lines long!

(15 pts.) Write five data modification commands on your PDA database. Most of these commands should be ``interesting,'' in the sense that they involve some complex feature, such as inserting the result of a query, updating several tuples at once, or deleting a set of tuples that is more than one but less than all the tuples in a relation. As for the queries in (1), you might want to try out your commands on small data before trying it on your full database. Hand in a script that shows your modification commands running in a convincing fashion.

(10 pts.) Create two views on top of your database schema. Show your CREATE VIEW statements and the response of the system. Also, show a query involving each view and the system response (but truncate the response if there are more than a few tuples produced). Finally, show a script of what happens when you try to update your view, say by inserting a new tuple into it. Are either of your views updatable? Why or why not? (Updatable views are discussed in Section 5.8.4 of the text. Essentially, a view is updatable if it is a selection on one base table.)

(20 pts.) In part (1) you probably discovered that some queries run very slowly over your large database. An important technique for improving the performance of queries is to create indexes. An index on an attribute A of relation R allows the database to find quickly all tuples in R with a given value for attribute A. This index is useful if a value of A is specified by your query (in the where-clause). It may also be useful if A is involved in a join that equates it to some other attribute. For example, in the query
```
     SELECT Bars.address
     FROM Drinkers, Bars
     WHERE   Drinkers.name = 'joe'
         AND Drinkers.frequents = Bars.name;
```
we might use an index on Drinkers.name to help us find the tuple for drinker Joe quickly. We might also like an index on Bars.name, so we can take all the bars Joe frequents and quickly find the tuples for those bars to read their addresses.
In Oracle, you can get an index by the command:
```
     CREATE INDEX <IndexName> ON <RelName>(<Attribute List>)
         TABLESPACE indexes;
```
Note:
- The text on the second line (``tablespace indexes'') is unique to the Stanford Oracle installation and is designed to get our indexes onto a second disk used only for that purpose, so one disk can be retrieving data while the other is using an index.
- Oracle creates indexes automatically when you declare an attribute(s) PRIMARY KEY or UNIQUE. Thus, some indexes may exist before you create them and may also be making certain queries run faster than than they would with no indexes at all. Be sure to take this factor into account when trying to explain differences in running times. If you are unable to discover queries whose running-time is affected by the indexes that you declare, you could try temporarily removing your key declarations, so Oracle will not create indexes ``behind the scenes.''
If the attribute list contains more than one attribute, then the index requires values for all the listed attributes to find a tuple. That situation might be helpful if the attributes together form a key, for example. An illustration of the CREATE INDEX command is
```
     CREATE INDEX DrinkerInd ON Drinkers(name)
         TABLESPACE indexes;
     CREATE INDEX BarInd ON Bars(name)
         TABLESPACE indexes;
```
which creates the two indexes mentioned above. To get rid of an index, you can say DROP INDEX followed by the name of the index. Notice that each index must have a name, even though we only refer to the name if we want to drop the index.
Create at least two useful indexes for your PDA. Run your queries from part (1) on your large database with the indexes and without the indexes. To time your commands, you may issue the following commands to sqlplus:
1. TIMING START <TimerName>; starts your timer. Give it whatever name you wish.
2. TIMING SHOW; prints the current wall-clock time of your current timer. (There is a way to switch among timers, which is why they are named, but we shall not use this feature.)
3. TIMING STOP; prints the current time of your timer and stops it.
Naturally these times may be affected by external factors such as system load, etc. Still, you should see a dramatic difference between the execution times with indexes and the times without. Turn in a script showing your commands to create indexes, and showing the relative times of query execution with and without indexes.

Problem Set

A number of the queries below are quite complex. There are several things you can do to break your thought processes into a sequence of steps:

Create new relations to use as temporaries, and insert into them.
Define some views and use them in later views or in the query itself.
Use subqueries.

Sometimes, (1) or (2) are easier at first, but after figuring it out, you can write a single query with subqueries. Any of these three approaches are OK for the problem set.

All queries should be written in standard SQL2.

(8 pts.) A fair, sealed bid auction is conducted by awarding the item to the person with the highest bid, who pays the amount of the second-highest bid. (Do you see why that is fair? If the bidding were open, the highest bidder would bid epsilon more than the second-highest bidder, and win the auction.) Suppose we have a relation Bids(bidder,price). Write a query to determine the sales price, i.e., the bid of the second-highest bidder. You may assume that no two bidders bid the same price.
(16 pts.) Some auctions offer more than one identical item (e.g., 10 ``Wally the Whale'' Beanie Babies), and bidders can bid for any number of them, up to the total number being auctioned. The highest bidders get the items, and the price they pay is the highest price such that the total number requested at that price or higher exceeds the number available (note how this rule generalizes the rule from Question 1).
Suppose we have a relation Bids(bidder,price,quantity), representing bids for 10 Beanie Babies. In this problem, you should assume that there may be several different bidders bidding the same price, perhaps for different numbers of items.

(a)

Write a query to find the selling price, i.e., the highest price at which the total bid is for 11 or more items.

(b)

Write a query to produce a table of all winning bidders, and the number that they acquire. Note that all winning bidders get the quantity that they bid for, except the winning bidder(s) with the lowest price; they divide whatever is left from among the 10, in proportion to the number of items they bid for. As a result, some bidders may get a fractional quantity, which has to be resolved in some way that you need not worry about.

Hint: Start by constructing a table that gives for each price, the total number bid at that price. You have to join this table with itself to get another table that gives, for each price, the total number bid at that price or higher.
(16 pts.) The following questions are also based on an ``auction'' database, with the following relations:
```
     Bids(auctionID, bidder, price, quantity)
     Auctions(auctionID, seller, item, quantity, expires)
     Ratings(seller, stars)
```
The relations and attributes should be self explanatory, except perhaps for the last relation, which is intended to be a table of ratings that sellers have received from buyers. Each seller mentioned in Auctions has received 0 or more ratings, and each rating is in the form of a number of stars, from 1 to 5.

(a)

Define a view AuctionInfo(auctionID,item,avgStars) that gives the average number of stars of the seller of the auction with ID auctionID. However, if the seller has never been rated (i.e., doesn't appear in Ratings), then avgStars should be NULL.

(b)
Arrange that each seller bids 0 for the total number of items in the auction (i.e., perform an insertion into Bids based on information in Auctions).

(c)
Bidder ``Sally'' wishes to raise all her bids so they are $1 more than the highest bid so far in each auction in which she is participating. Show how to modify Bids to perform these changes.

(d)
Compute the average ratings of all the sellers in the Ratings table. Note that different sellers may be rated different numbers of times, so you cannot simply average the numbers of stars in each of the tuples of the relation.