CS145 Assignment #4
Due Wednesday, Nov. 1, 2000
Step 4 of Your PDA
This week we shall write and run some SQL queries.
It is suggested that you begin to follow the routine of loading your
database from your load file, running queries or other commands from
a file that contains those SQL statements, and then deleting your data
so it doesn't clutter up the database all week and (worse) you don't
forget and load the same tuples several times into your relations.
Remember that SQL thinks of relations as bags, and so will happily let
you insert the same tuple as many times as you ask it to.
To clean out a relation R without deleting its schema, use
DELETE FROM R;.
Write five queries on your PDA database,
using the select-from-where construct of SQL.
To receive full credit, all but perhaps one of your queries should exhibit
some interesting feature of SQL: queries over more than one relation,
or subqueries, for example.
We suggest that you experiment with your SQL
commands on a small database (e.g., your hand-created database),
before running them on the large database that you loaded in PDA part
Initial debugging is much easier when you're operating on small
amounts of data. Once you're confident that your queries are
working, run them on your complete database.
If you discover that most or all of your ``interesting'' queries
return an empty answer on your large database, check whether you
followed the instructions in Assignment #3 for generating data
values that join properly. You may need to modify your data
Turn in a copy of all of your SQL queries, along with a script
illustrating their execution. Your script should be sufficient to
convince us that your commands run successfully. Please do not,
however, turn in query results that are thousands (or hundreds of
thousands) of lines long!
- (15 pts.) Write five data modification commands on your PDA
Most of these commands
should be ``interesting,'' in the sense that they involve some complex
feature, such as inserting the result of a query, updating several tuples
at once, or deleting a set of tuples that is more than one but less than
all the tuples in a relation. As for the queries in (1), you might
want to try out your commands on small data before trying it on your full
database. Hand in a script that shows your modification commands running
in a convincing fashion.
- (10 pts.) Create two views on top of your database schema. Show
your CREATE VIEW statements and the response of the system. Also,
show a query involving each view and the system response (but truncate
the response if there are more than a few tuples produced). Finally, show
a script of what happens when you try to update your view, say by inserting
a new tuple into it. Are either of your views updatable? Tell why or why not?
(Updatable views are discussed in Section 5.8.4 of the text.
Essentially, a view is updatable if it is a selection on one base
In part (1) you probably discovered that some queries run very
slowly over your large database. An important
technique for improving the performance of queries is to create
indexes. An index on an attribute A of relation R allows the
database to find quickly all tuples in R with a given value for
This index is useful if a value of A is specified by your query
(in the where-clause).
It may also be useful if A is involved in a join that equates it
to some other attribute.
For example, in the query
FROM Drinkers, Bars
WHERE Drinkers.name = 'joe'
AND Drinkers.frequents = Bars.name;
we might use an index on Drinkers.name to help us find the
tuple for drinker Joe quickly.
We might also like an index on Bars.name, so we can take all
the bars Joe frequents and quickly find the tuples for those bars to
read their addresses.
In Oracle, you can get an index by the command:
CREATE INDEX <IndexName> ON <RelName>(<Attribute List>)
The text on the second line (``tablespace indx'') is
unique to the Stanford Oracle installation and is designed to get our
indexes onto a second disk used only for that purpose, so one disk can
be retrieving data while the other is using an index.
Oracle creates indexes automatically when you declare an attribute(s)
PRIMARY KEY or UNIQUE.
Thus, some indexes may exist before you create them and may also be
making certain queries run faster than than they would with no indexes
Be sure to take this factor into account when trying to explain
differences in running times.
If you are unable to discover queries whose running-time is affected
by the indexes that you declare, you could try temporarily removing your
key declarations, so Oracle will not create indexes ``behind the
If the attribute list contains more than one attribute, then the index
requires values for all the listed attributes to find a tuple.
That situation might be helpful if the attributes together form a key,
An illustration of the CREATE INDEX command is
CREATE INDEX DrinkerInd ON Drinkers(name)
CREATE INDEX BarInd ON Bars(name)
which creates the two indexes mentioned above.
To get rid of an index, you can say DROP INDEX followed by the
name of the index.
Notice that each index must have a name, even though we only refer to
the name if we want to drop the index.
Create at least two useful indexes for your PDA. Run your queries
from part (1) on your large database with the indexes and without the
To time your commands, you may issue the following commands to
- TIMING START <TimerName>; starts your timer.
Give it whatever name you wish.
- TIMING SHOW; prints the current wall-clock time of your
(There is a way to switch among timers, which is why they are named,
but we shall not use this feature.)
- TIMING STOP; prints the current time of your timer and stops
Naturally these times may be affected by external factors
such as system load, etc. Still, you should see a dramatic difference
between the execution times with indexes and the times without. Turn
in a script showing your commands to create indexes, and showing the
relative times of query execution with and without indexes.
In this assignment, we will continue to work with the book database
from the previous assignment. However, we now make the book title and
the year jointly be the key for books; this change is reflected in all
Here are the relations again:
- BookAuthor(book, year, author, earnings)
- BookReference(book, year, referencedBook,
- BookReview(book, year, reviewer, score)
- BookPublish(book, year, publisher, price, num)
In this database, each book may have one or more authors and each author
may make a different amount of money from that book. One book may make
reference to other books. One book may be reviewed by different reviewers
and get different scores. An author could also be a reviewer and a publisher.
- For each of the following, your answer must be in SQL2:
- (a)(4 pts.)
Give suitable schema definitions for each relation.
- (b)(1 pt.)
Alter the BookReference table to set the default value of
the attribute times to be 1.
- (c)(1 pt.)
Define, using a CREATE DOMAIN statement,
an appropriate domain for the year attributes.
- (d)(1 pt.)
Drop the earnings attribute in BookAuthor.
- (e)(3 pts.)
The command you gave in (d) above fails on the Oracle system, as Oracle
does not support dropping of attributes from schemas. How would you
achieve the same effect on Oracle?
- (a)(6 pts.)
Consider the following two SQL commands:
- DROP TABLE foo;
- DELETE FROM foo;
Do they have the same effect on foo? Suggest one effect at least
one of these statements could have on views
defined on foo?
Give an example of an effect they could have on indexes defined for
Give one way they could affect other relations in the database?
- (b)(4 pts.)
People often underestimate the utility of having nulls in their relations,
and the operator semantics defined on nulls. The first instinct is to get
rid of nulls in a relation by defining default values for the attribute.
However, this choice might end up ``corrupting'' the results of some
Consider the price attribute in BookPublish. Give one
example of a query where it makes more sense for
price to be null for unknown values, rather than a default value such as
Explain your reasoning.
- (10 pts.)
Erdos Numbers have become ``folklore of mathematicians'' throughout the
world. These are named after the Hungarian mathematician Paul Erdos. Erdos
was one of the most prolific publishers of papers.
The numbers are defined for persons who have authored papers.
Each person who has
authored a paper has an Erdos number, which is defined as the least
number of ``hops'' needed to connect the author with Erdos.
himself has a number 0. Coauthors of Erdos have Erdos number 1. Einstein
has Erdos number 2, since he wrote a paper with Ernst Straus, and Straus
wrote many papers with Erdos (giving Straus an Erdos number of 1).
Notwithstanding that nobody knows why people care about computing
Erdos Numbers, our database is expected to get a lot of queries on it.
So, define views to find authors who have:
- Erdos number 1.
- Erdos number exactly 2.
- Erdos number 3 or less.
Note: You must consider books to be papers in the Erdos Number
- (10 pts.)The SQL2 standard allows a foreign-key dependency to
refer to the same relation, as in the following example:
CREATE TABLE Managers (
employee-name CHAR(20) NOT NULL,
manager-name CHAR(20) NOT NULL,
PRIMARY KEY (employee-name),
FOREIGN KEY (manager-name) REFERENCES Managers(employee-name)
ON DELETE CASCADE;
Here, employee-name is the key for the table Managers, meaning
that each employee has at most one manager. The foreign-key clause
requires that every manager also be an employee. How would you start
inserting tuples into this table? What happens when a tuple in the
relation Managers is deleted? Would a ``set-null'' policy work here?
Explain your reasoning briefly.