CS145 Written Assignment 4

CS145 - Introduction to Databases
Spring 2000, Prof. Widom

Written Assignment #4
Due Wednesday May 10

Problem 1

Recall from Problem 9 in Written Assignment #3 the relation Flight(from,to,cost,airline), containing nonstop flights from one city to another. We were interested in part (c) of that problem in writing a single SQL query that would return the cheapest cost of flying between each pair of cities, regardless of the number of stops. While it is not possible to write such a query in SQL2, it is possible to compute the answer using SQL in conjunction with a programming language, and that's your task in this problem. You may use SQL embedded within a C-like language as presented in Section 7.1 of the textbook, you may use a PL/SQL-like syntax, or you may use ODBC/JDBC-style dynamic SQL. Regardless, we are not particularly concerned with correct syntax, and you can certainly "wing it" on syntax for things like checking for end conditions. We're interested primarily in your overall algorithm, and the interaction between programming constructs and SQL.

Problem 2

This problem is a restatement of Exercise 7.2.3 on page 389 of the textbook. Consider the four operations in parts (a), (b), (c), and (d) of Exercise 7.2.1 on page 388 of the textbook. For this problem you only need to consider the English descriptions of the operations - you need not rewrite them in SQL unless it helps you.

(a) Suppose a transaction T executes the operation in part (a) of Exercise 7.2.1, and any number of other transactions may at the same time be executing any number of all four operations. Briefly describe in English a behavior of T's operation that can occur if all transactions are running with isolation level READ UNCOMMITTED that cannot occur if all transactions are running with isolation level SERIALIZABLE.

(b) Same as part (a) except suppose transaction T executes the operation in part (b) of Exercise 7.2.1.

(c) Same as part (a) except suppose transaction T executes the operation in part (c) of Exercise 7.2.1.

(d) Same as part (a) except suppose transaction T executes the operation in part (d) of Exercise 7.2.1.

Problem 3

In this problem you will be graded on simplicity of your examples as well as correctness.

Consider one of our favorite relational schemas - Product, PC, Laptop, and Printer - e.g., appearing in Exercise 6.6.2 of the textbook on the top of page 358.

(a) Give an example of two transactions T1 and T2 operating on this database. Assume that transaction T1 has isolation level SERIALIZABLE. (But note that true serializability is guaranteed only when all transactions have this isolation level.) For transaction T2, consider isolation levels READ UNCOMMITTED and READ COMMITTED. Design your transactions so that there is some result allowed when T2 uses READ UNCOMMITTED that is not allowed when T2 uses READ COMMITTED. Assume that transactions T1 and T2 both commit. More specifically, give:

The two transactions T1 and T2, each specified as a sequence of one or more SQL statements followed by commit. Your transactions should be as simple as possible; e.g., they certainly do not need to involve all of the relations in the schema.
A simple initial set of data for the relations used by your transactions (the initial state).

The state of the data after T1 and T2 have both executed such that this final state could result when T2 uses READ UNCOMMITTED, but could not result when T2 uses READ COMMITTED. Also specify the interleaving (ordering) of the statements between the two transactions that produced this final state.

(b) Do part (a) again, except this time for transaction T2 consider isolation levels READ COMMITTED and REPEATABLE READ, where the final state could result when T2 uses READ COMMITTED, but could not result when T2 uses REPEATABLE READ.

(c) Do part (a) again, except this time for transaction T2 consider isolation levels REPEATABLE READ and SERIALIZABLE, where the final state could result when T2 uses REPEATABLE READ, but could not result when T2 uses SERIALIZABLE.

Problem 4

In this problem you will be graded on simplicity of your example as well as correctness.

Continue with the schema used in Problem 3. Suppose multiple clients are operating on the database, and all of their transactions use isolation level SERIALIZABLE. Give a realistic example where the final state of the database depends on the order in which concurrent transactions are serialized. Please specify:

An initial state for one or more relations.
Transactions issued by two clients concurrently (one transaction per client, one or more SQL statements per transaction).
Two different valid final states of the relations, depending on which transaction executed first.

Problem 5

Consider a set of users U, V, W, X, and Y. Suppose that user U creates a relation R(A) and is thus the owner of relation R. Now suppose the following set of statements is executed in order:

stmt user operation
1 U grant select on R to V,W with grant option
2 V grant select on R to W
3 W grant select on R to X,Y
4 U grant select on R to Y
5 U revoke select on R from V restrict
6 U revoke select on R from W cascade

stmt	user	operation
1	`U`	`grant select on R to V,W with grant option`
2	`V`	`grant select on R to W`
3	`W`	`grant select on R to X,Y`
4	`U`	`grant select on R to Y`
5	`U`	`revoke select on R from V restrict`
6	`U`	`revoke select on R from W cascade`

Show the grant diagram after steps 4, 5, and 6. Since all of the nodes in the diagrams will be for privilege "select on R", you may omit writing it in each node.

Problem 6

Warning: this is your open-ended question for this assignment, although we expect that most students can come up with some solution, and some of you will come up with the best solution.

There's a type of database security that was not covered in class or in the textbook called "statistical authorization". With statistical authorization, some users may be permitted to retrieve only aggregate information from the database, e.g., average salaries but not individual salaries. Furthermore, to prevent users from applying aggregates to very small numbers of tuples (such as the average of one salary!), the system requires that a certain minimum number of tuples contribute to each aggregate result. Finally, to prevent the user from using intersection properties to deduce a single value (e.g., the user could ask for X=sum(A1,A2,A3,A4,A5), then ask for Y=sum(A2,A3,A4,A5), then compute X-Y to deduce the value of A1), the system may require that the tuples participating in multiple queries issued by the same user have a small intersection. In this problem you will explore how, even with such security measures, specific information can still be deduced from the database.

Here's the problem. Consider the simple relation student(ID,GPA). Suppose that, for security reasons, the following restrictions are made on user U's set of queries against this relation:

The result of every query must be a single aggregate value - a SQL aggregate function applied to one of the attributes of the relation.
At least 4 different tuples must be used in the aggregate to produce each query's result.
For any two queries issued by user U, the sets of tuples used to produce the two query results must have an intersection no larger than 2.

Assume that student IDs are in the range 1 to 50, and that attribute GPA is of type float. Give a set of queries that satisfies the above restrictions, and from whose answers you can determine the GPA of the student with ID=1. Write the queries in SQL, then show the computation that produces the GPA for the student with ID=1 from the query results. Use as few queries as you can.

CS145 - Introduction to DatabasesSpring 2000, Prof. Widom

Written Assignment #4 Due Wednesday May 10