### Assignment #4 Due Wednesday May 9

#### LOGISTICS

• Please read the document Oracle 8i SQL (make sure to read the most recent version) for all kinds of useful information related to the PDA part of this assignment.

• Turning in written exercises: The procedure for turning in the written exercises on this assignment is the same as for the previous assignments, with the same deadline and late policy.

• Turning in PDA: Your PDA work will again be submitted electronically. The procedure is the same as for the previous assignment (see http://www.stanford.edu/class/cs145/submit-README.txt), and the same deadline and late policy applies.

#### Problem 1

Review Section 5.9 in the textbook. In this problem you are to use SQL to implement two of the natural outerjoin operators, without using SQL's built-in outerjoin. Specifically, consider two relations R(A,B) and S(B,C). Write:

(a) A SQL query that computes the natural full outerjoin of R and S, without using outerjoin

(b) A SQL query that computes the natural left outerjoin of R and S, without using outerjoin

In your queries, you may select the constant value NULL if you find it useful. For example, the following query returns all tuples in T(A,B,C) with all B values replaced by NULL:

```   select A,NULL,C from T
```

#### Problem 2

Consider a relational schema Employee(ID,salary), where ID is a key. Do you think the following two queries are always equivalent, i.e., do they yield the same result on all possible databases?
```  SELECT AVG(salary) FROM Employee

SELECT SUM(salary)/COUNT(*) FROM Employee
```
If you think they are always equivalent, justify your answer. If you think they are not always equivalent, give the simplest example data for Employee you can think of for which you expect the two queries to yield different results.

#### Problem 3

Recall from Problem 3 in Assignment #3 the relation Flight(city1,city2,cost,airline), containing nonstop flights from one city to another. We were interested in part (c) of that problem in writing a single SQL query that would return the cheapest cost of flying between each pair of cities, regardless of the number of stops. While you may have had difficulty writing such a query in SQL2, it is possible to compute the answer using SQL in conjunction with a programming language, and that's your task in this problem. You may use SQL embedded within a C-like language as presented in Section 7.1 of the textbook, you may use a PL/SQL-like syntax, or you may use CLI/JDBC-style dynamic SQL. Regardless, we are not particularly concerned with correct syntax, and you can certainly "wing it" on syntax for things like checking for end conditions. We're interested primarily in your overall algorithm, and the interaction between programming constructs and SQL.

#### Problem 4 (Personal Database Application, Part 4)

In this part of the project, you will issue SQL queries and updates against your PDA database, and you will experiment with the use of indexes. Since you will be modifying your data as part of this assignment, we strongly suggest that you adopt the routine for getting repeated "fresh" starts with Oracle described in the Maintaining your databases section of Assignment #3, if you haven't already.

(a) SQL

Develop and test:

• At least six SQL data retrieval (select) commands.
• At least two each of the four types of SQL data modification commands: insert a single tuple, insert a subquery, delete, update.
• For this assignment you will be invoking your SQL commands interactively through sqlplus, as described in the document Getting Started With Oracle. Of course you should certainly build a script file, rather than typing in the queries each time you run them.

• Please write "interesting" queries. You should try to use most or all of the SQL constructs discussed in class and in the textbook (subqueries, aggregates, set operators, etc.). You will not receive full credit if your queries and modifications are all extremely simple.

• We suggest that you experiment with your SQL commands on your small hand-created database before running them on the large database for which you generated data. Initial debugging is much easier when you're operating on small amounts of data. Once you're confident that your commands are working, try them on your large database. We do expect that the commands you turn in work properly on both your small and large databases.

• If you discover that most or all of your "interesting" queries return an empty answer on your large database, check whether you followed the instructions in PDA Part 3 for generating data values that join properly. You may need to modify your data generator.

• Turn in a copy of all of your SQL commands, along with a script illustrating their execution on either your small or large database. Your script should be sufficient to convince us that your commands run successfully, but you can and should truncate query results after a few lines. Please do not turn in query results that are hundreds of lines long. (Your submission may be rejected anyway if you try, as explained in the electronic submission instructions.)

(b) Indexes

In part (a) you may have discovered that some queries run very slowly over your large database. As discussed in class, an important technique for improving the performance of queries is to create indexes. An index on an attribute A of relation R allows the database to quickly find all tuples in R with a given value for attribute A (which is useful when evaluating selection or join conditions involving attribute A). An index can be created on any attribute of any relation, or on several attributes combined. The syntax for creating indexes in Oracle is given in the document Oracle 8i SQL. Please pay careful attention to the information about Stanford/Oracle-specific syntax.

Create at least three useful indexes for your PDA. Run your queries from part (a) on your large database with the indexes and without the indexes. Turn in a script showing your commands to create indexes, and showing the relative times of query execution with and without indexes. Here too, please truncate any large query results.

• As mentioned in the document Oracle 8i SQL, Oracle automatically creates indexes for attributes declared as keys. Thus, when you created a relation in PDA Part 3 and declared a PRIMARY KEY, an index was created, and you need not attempt to create another index on your key attributes.

• As described in the document Oracle 8i SQL, in order to set the system to show query execution times you must issue the command "set timing on;" at the sqlplus prompt.

• Your timings will be affected by external factors such as system load. However, for some of your queries, with appropriate indexes you should see a consistent dramatic difference between the execution times with indexes and the times without. If others of your queries do not show performance improvement even when relevant indexes are created, please include a short note suggesting why this might be the case. (If you are completely unable to find queries whose running times are improved by creating indexes, you could try temporarily removing your PRIMARY KEY declarations, so Oracle will not create any indexes automatically.)