CS145 Assignment #3

Due Wednesday, October 25, 2000

Step 3 of Your PDA (Personal Database Application)

Note1: see Recording Your Session in the on-line Getting Started With Oracle document for a guide to preparing output to hand in with your assignment. It will be useful for this and subsequent PDA parts.

Note2: Oracle is not being backed up. You need to save anything you need long-term in the leland file system.

(a)
(25 pts.) Write an SQL database schema for your PDA, using the CREATE TABLE commands described in the handout Getting Started With Oracle . Pick suitable datatypes for each attribute. Page 286 of the text gives you the principal options regarding types. Hand in a printout of the commands you use to create your database schema (it is a good idea to keep this file for the balance of the course). Show the response of sqlplus to a request to describe each of your relation schemas. For example, to see the schema for relation Foo type
     DESCRIBE Foo;
(b)
(10 pts.) Execute five INSERT commands to insert tuples into one of your relations. Show the response of sqlplus and the relation that results when you issue a SELECT * command. Again, the information on how to do this step is in Getting Started With Oracle.

(c)
(25 pts.) Develop a substantial amount of data for your database and load it into your relations using the SQL load command. See The Oracle Bulk Loader for information on how to bulk-load data. To create the data, write a program in any programming language you like that creates large files of records in a format acceptable to the Oracle bulk loader, then load the data into your PDA relations. If you are using real data for your PDA, your program will need to transform the data into files of records conforming to your PDA schema. The rest of you will write a program to fabricate data: your program will generate either random or nonrandom (e.g., sequential) records conforming to your schema. Note that it is both fine and expected for your data values--strings especially--to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of realistic size, rather than the small ``toy'' databases often used in classes. The data you generate and load should be on the order of:

If the semantics of your application includes relations that are expected to be relatively small (e.g., schools within a university), it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:

(1)
Although you have not (yet) declared keys in your relations, in many cases you probably know that an attribute or set of attributes in a relation will serve as a key. If so, be sure not to generate duplicate values for these attributes.

(2)
Your PDA almost certainly includes relations that are expected to join with each other. For example, you may have a Student relation with attribute courseNo that's expected to join with attribute number in relation Course. In generating data, be sure to generate values that actually do join--otherwise all of your interesting queries will have empty results! One way to guarantee joinability is to generate the values in one relation, then use the generated values in one relation to select joining values for the other relation. For example, you could generate course numbers first (either sequentially or randomly), then use these numbers to fill in the courseNo values in the student relation.

Turn in your program code for generating or transforming data, a small sample of the records generated for each relation (5 or so records per relation), and a script showing the loading of your data into Oracle.

Problem Set

  1. (30 pts.) This problem is based on a book database with four relations:
         BookAuthor(book, author, earnings)
         BookReference(book, referencedBook, times)
         BookReview(book, reviewer, score)
         BookPublish(book, year, publisher, price, num)
    

    In this database, each book may have one or more authors and each author may make a different amount of money from that book. One book may make reference to other books. One book may be reviewed by different reviewers and get different scores. An author could also be a reviewer and a publisher.

    Write the following quries in relational algebra or SQL, whichever is specified in the query. For relational algebra, you may write a sequence of steps with named temporary relations if you like. For SQL, make sure that your outputs do not have duplicate tuples.

    (a)
    Find all the books published in 1999 and reviewed by both Paul Gray and Daphne Merkin. Write this query in SQL.

    (b)
    Find all the reviewers who never reviewed their own books. Write in relational algebra.

    (c)
    Find all the authors who reviewed more than two books written by Charles Dickens. Write in:

    1. SQL with aggregates

    2. SQL without aggregates but with subqueries in the WHERE clause

    3. SQL without aggregates and without subqueries in the WHERE clause

    4. Relational algebra

    (d)
    Find all authors who have written exactly one book and reviewed more than one book. Write in SQL.

    (e)
    Find all reviewers who have reviewed every book by Stephen King. Write in

    1. Relational algebra

    2. SQL with EXCEPT

    3. SQL without EXCEPT

    (f)
    Find the book(s) published in 1999 with the highest review score. Write in relational algebra.

    (g)
    List each author's total earnings in 1995-2000 in descending order. Write in SQL.

  2. (10 pts.) Consider the same book database described in Problem #1, with one exception --- the key for BookPublish has to include the year:
         BookAuthor(book, author, earnings)
         BookReference(book, referredBook, times)
         BookReview(book, reviewer, score)
         BookPublish(book, year, publisher, price, num)
    

    Write the following data modifications in SQL.

    (a)
    If a book earns more than $100000 in total for it's authors, add a review with score 10, and make the reviewer be the author of this book who earned the most from the book (assume no ties).

    (b)
    For each book, remove its review with the highest score (assume no ties). If this book has only one review, do not remove it.

    (c)
    If a book's average review score is greater than 8 and was published in 1999, then add this book with the same publisher to BookPublish for the year 2000, but with double the price and double num (number of copies printed).