Assignment #3
Due Wednesday October 23

Consider the following relational database schema:
```
   lives (name, street, city)
   works (name, company, salary)
   located (company, city)
   manages (name, manager-name)
```
You may make the following assumptions for this problem:

(A1)
The first attribute is a key for each relation (i.e., lives.name, works.name, located.company, and manages.name are keys).

(A2)
Every person in the works relation also appears in the lives relation, but not necessarily vice-versa (i.e., no one works without living, but people may live without working).

Specify a relational algebra expression for each of the following queries:

(a)
Find the names of all people who work for IBM.

(b)
Find the names and cities of all people who work for IBM.

(c)
Find the names, streets, and cities of all people who work for IBM and earn more than $30,000.

(d)
Find the names of all people who live in the same city as the company they work for.

(e)
Find the names of all people who live in the same city and on the same street as their manager.

(f)
Find the names of all people who are their own second-level manager--that is, find all people P such that P's manager's manager is P.

(g)
Find the names of all people who do not work for IBM.

(h)
Find the names of all people who earn a salary that is higher than the salary earned by any person who works for IBM.
Consider Problem #1 again, but now assume that people can work for many companies and managers, i.e., change assumption (A1) so that <name,company> is a key for works and <name, manager-name> is a key for manages. In the English description of part (c) replace ``earn more'' with ``earn more at any company''. In the English description of part (d) replace ``the company'' with ``some company''. In the English description of part (e) replace ``their manager'' with ``one of their managers''; similarly for part (f). Do any of your relational algebra queries from Problem #1 need to be modified under these new assumptions? If so, show the modified queries.
Consider Problem #1 again, including assumptions (A1) and (A2) as stated. Write SQL queries for parts (a)-(f).
Consider Problem #2 again, including the changed assumptions and English descriptions. Show any modifications to your SQL queries from Problem #3 that are needed under the new assumptions. Here there's an additional twist: Make sure that none of your SQL queries return duplicate values. Do so by adding the keyword distinct to eliminate duplicates, but be sure to add distinct only when necessary.
Personal Database Application (PDA)
This week you will load data into the relations you created in Sybase for your PDA. You will load two sets of data: one small set created ``by hand'' and one large set generated by a program.
(a)
For each relation in your PDA, create a file containing a few (approx. 3-10) records in the format prescribed by Sybase, then use the Sybase bulk-loading facilities to insert those records as tuples into your relations. Turn in a listing showing the files you created and the successful loading of the data into Sybase. (Note that Handout #6, Populating Your Database in Sybase, also tells you how to delete all tuples from a relation once the tuples have been loaded, a feature you will find handy as you work on this assignment.)

(b)
Write a program in any programming language you like (C or C++ is a natural choice, but other languages are fine as well) that creates large files of records in the format prescribed by Sybase, then load the data into your PDA relations. If you are using real data for your PDA, your program will need to transform the data into files of records conforming to your PDA schema. The rest of you will write a program to fabricate data: your program will generate either random or non-random (e.g., sequential) records conforming to your schema. Note that it is both fine and expected for your data values--strings especially--to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of realistic size, rather than the small ``toy'' databases often used in classes. The data you generate and load should be on the order of:
- At least two relations with thousands of tuples
- At least one relation with hundreds of tuples
If the semantics of your application includes relations that are expected to be relatively small (e.g., schools within a university), it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:

(1)
Although you have not (yet) declared keys in your Sybase relations, in many cases you probably know that an attribute or set of attributes in a relation will serve as a key. If so, be sure not to generate duplicate values for these attributes.

(2)
Your PDA almost certainly includes relations that are expected to join with each other. For example, you may have a student relation with attribute course-no that's expected to join with attribute number in relation course. In generating data, be sure to generate values that actually do join--otherwise all of your interesting queries will have empty results! One way to achieve this is to generate the values in one relation , then use the generated values in to select joining values for . For example, you could generate course numbers first (either sequentially or randomly), then use these numbers to fill in the course-no values in the student relation.

Turn in your program code for generating or transforming data, a small sample of the records generated for each relation (5 or so records per relation), and a script showing the loading of your data into Sybase.

The TAs of CS145, cs145ta@cs.stanford.edu, last modified: 10/15/96

Assignment #3 Due Wednesday October 23

Assignment #3
Due Wednesday October 23