## CS145 Final Sample Solution

Questions 25-26 are graded by Erik, Questions 27-29 by Mark, Question 30-31 by Mike, and Question 32 by Jun. If you want to discuss the grading of these questions, you should give your exam to the relevant person, together with a note explaining why you think the grading was questionable.

#### Question 1: (B)

Through the Directs relationship, a museum is associated with at most one artist. Through the Displays relationship, a museum is associated with any number of paintings.

#### Question 2: (D)

The type of a.paints is Set<Painting> according the ODL declaration of the Artist class. Each output tuple has type Struct {string name, Set<Painting> paints}. The result  should be a Bag of these structures.

#### Question 3: (A)

I is correct. II has a type mismatch problem: it compares p (of type Painting) to a.paints and m.displays (both of type Set<Painting>).

#### Question 4: (C)

I doesn't need a DISTINCT because it's simply a selection over Artists, which doesn't contain any duplicates. II needs a DISTINCT: different paintings could be painted by the same artist, so p.isPaintedBy could be the same for different p's. III needs a DISTINCT because of the multiplicative effect of the FROM clause.

#### Question 5: (C)

It's easy to see that both A->B and B->C hold in T(A, B, C). Obviously, A->C follows from A->B and B->C by the transitivity rule for FD's. B->C implies B->>C because FD is MVD; B->>C implies B->>A by the complementation rule for MVD's.

#### Question 6: (C)

(C) is lossy. Consider the following counterexample:
R
 A B C D 1 1 1 1 2 2 1 1

R1
 A D 1 1 2 1

R2
 B D 1 1 2 1

R3
 C D 1 1

R1 JOIN R2 JOIN R3
 A B C D 1 1 1 1 2 1 1 1 1 2 1 1 2 2 1 1

#### Question 7: (B)

 A B C D t1 1 1 1 1 t2 1 2 2 2 t3 1 2 1 1 t4 1 1 2 2
Given t1 and t2, we can obtain t3 and t4 by swapping their CD values (because A->>CD). No other tuples can be inferred from these.

#### Question 8: (C)

What could go in the spade? First, it must contain some attribute other than A (if not, the first FD would be trivial and the basis wouldn't be minimal). Second, it cannot contain B (if it does, A->>CD would be redundant because A->B implies A->>B, which implies A->>CD by the complementation rule). Third, it cannot contain both C and D (if it does, A->>CD would again be redundant because A->CD implies A->>CD).

So, the spade is either C or D. (It could include an extra A but that wouldn't change the meaning of the FD.) Without loss of generality, let's suppose the spade is C (C and D are symmetric). The keys of R are {A, B, D} and {B, C, D}. Therefore, R is not in BCNF (both A->C and CD->A are BCNF violations). On the other hand, R is in 3NF (A->C doesn't violate 3NF because C is in the key {B, C, D}, and CD->A doesn't violate 3NF because A is in the key {A, B, D}).

#### Question 9: (D)

The keys of R are {A, D, E} and {B, D, E}. Therefore, all three FD's are BCNF and 4NF violations. A->B is not a 3NF violation because B is in the key {B, D, E}. BD->A is not a 3NF violation because A is in the key {A, D, E}. A->C is a 3NF violation because C is not in any key.

#### Question 10: (C)

Starting with A->B, we decompose R into AB and ACDE. ACDE contains a BCNF violation A->C, so we decompose it further into AC and ADE. A->B holds in AB, A->C holds in AC, and ADE doesn't support any nontrivial FD's. Obviously, BD->A is not preserved.

#### Question 11: (B)

When all tuples are deleted from T, all S tuples with S.CIS NOT NULL are deleted too, following the CASCADE policy. Then, all R.B values are set to NULL because of the SET NULL policy.

#### Question 12: (A)

Because of the referential integrity constraints, each R tuple joins with at most one S tuple, and this S tuple in turn joins with at most one T tuple. Therefore, the number of tuples produced by the join is no more than the number of tuples in R.

#### Question 13: (D)

An important observation is that UNIQUE (A) is stronger than PRIMARY KEY (A,B) (assuming that everything is NOT NULL). Therefore, when inserting R tuples into S, we don't need to check whether two R tuples together could violate PRIMARY KEY (A,B); we simply need to make sure that a R tuple and S tuple together won't violate PRIMARY KEY (A,B).

Also, II won't violate any constraint; it's even more conservative than I.

#### Question 14: (B)

Here, the important observation is that PRIMARY KEY (A,B) is weaker than UNIQUE (A) and UNIQUE (B). Therefore, when inserting S tuples into R, we need to check whether two S tuples together could violate UNIQUE(A) or UNIQUE (B).

I avoids this problem by processing one S tuple at a time; but II cannot avoid this problem.

#### Question 15: (D)

It's impossible for an update of D or a deletion from R to violate A->BC.

#### Question 16: (C)

Consider the following instance of R:
 A B C D 1 1 1 1 1 2 2 2 1 1 1 2 1 2 2 1
Currently, R satisfies A->>BC. If we update column D of the last row to 0, or if we delete the last row, A->>BC will be violated because R no longer contains the tuple (1, 2, 2, 1).

#### Question 17: (D)

2000 is the result of the execution sequence T1.step1, T1.step2, T2.

2200 is the result of the execution sequence T1.step1, T2, T1.step2.

2300 is the result of the execution sequence T2, T1.step1, T1.step2.

#### Question 18: (B)

2000 and 2300 are still possible, but 2200 is not, because T2 cannot get in between T1.step1 and T1.step2 to read the uncommitted value written by T1.step1.

#### Question 19: (C)

(A) is sufficient but not necessary. (B) and (D) are not sufficient because of phantom tuples.

#### Question 20: (D)

(D) is sufficient and necessary. The only case where QMAX could possibly return something less than QMIN is when student 123 had no previous enrollment history, and T2 read some dirty data inserted by another transaction that was aborted later. Here is an example:
 ```-- T3: INSERT INTO Enroll VALUES(123, 'CS145',        'Spring 1999', 4.0);``` ```-- T2: SELECT MIN(grade) FROM Enroll WHERE SID = 123; -- QMIN -- returns 4.0``` ```ROLLBACK; -- T4: INSERT INTO Enroll VALUES(123, 'CS145',        'Spring 1999', 3.0); COMMIT;``` ```SELECT MAX(grade) FROM Enroll WHERE SID = 123; -- QMAX -- returns 3.0```
On the other hand, a committed insertion to Enroll can only make the lowest grade lower and the highest grade higher. So, under the assumption that Enroll is INSERT-only, if we make T2 run at READ COMMITTED, then the result of QMAX will always be no less than the result of QMIN.

#### Question 21: (C)

For each course, V contains the Enroll tuples with the highest grade ever assigned to that course. Thus, the query over V simply returns those courses with highest grade equal to 3.0. Both I and II return the exact same set of courses.

#### Question 22: (D)

Since we are only given the values of CID and term, neither (A) nor (B) will help. (D) allows us to quickly locate the set of tuples with CID='CS145' AND term='Spring 1999'. On the other hand, (C) lets us quickly locate the set of tuples with CID='CS145', and the set of tuples with term='Spring 1999'. To find tuples satisfying both conditions, we still need to intersect these two sets. Therefore, (C) is not as good as (D).

#### Question 23: (D)

I is obviously monotone since it's a simple selection over Enroll.

Although II uses the difference operator, it is still monotone because II is actually equivalent to I!

III is not monotone. For example, suppose that Enroll contains a student 007 who has not taken CS145. The current result of III should contain 007. Now, suppose we insert into Enroll the fact that 007 has just taken CS145. Then, 007 would disappear from the result.

#### Question 24: (B)

Here is the dependency graph for the query:
```                 SuperStudent ---+
|   A        |
-       V   |        |
SuperStat -----> SuperCourse     |
|                 |          |
|      -          V          |
+-------------> Enroll <-----+```
SuperStat is not monotone w.r.t. SuperCourse because adding a super course may cause the COUNT(*) field of certain SuperStat tuples to be incremented (updated). SuperStat is not monotone w.r.t. Enroll because adding an enrollment record may also cause the COUNT(*) field of a certain SuperStat tuple to be updated.

SuperStudent is monotone w.r.t. SuperCourse because adding a super course can only make more students become super; students who are already super will remain super. For the same reason, SuperStudent is monotone w.r.t. Enroll.

Similarly, SuperCourse is monotone w.r.t. SuperStudent and Enroll.

#### Question 25:

`PROJ_{SID} (SELECT_{grade>4.0} (Enroll))`
For each student whose max grade is above 4.0, we could also say that they have at least one grade above 4.0. This query just selects all the tuples with grades above 4.0, and then projects out the SID's associated with those tuples.

• An alternate solution that received full credit (although it's not as elegant) was a brute force method that found all maximum scores and then selected the ones above 4.0:
• ```PROJ_{SID} (SELECT_{grade>4.0} (
RENAME_{e2}(Enroll))
))```
• -1 if the solution was way more complicated than need be --- for example, lots of extra stuff to get rid of "duplicates" (relational algebra uses set semantics, so we don't need to worry about duplicates).
• -2 if the query wouldn't work, and more it had bigger problems.
• 0 point was generally reserved for solutions that were missing substantial functionality (for example, if no attempt was made to represent significant parts of the query).
• #### Question 26:

```PROJ_{SID} (Enroll) -
• Again, full credit was given to the less elegant, brute force submission, which was almost identical to the alternate solution for Question 25.
• -1 if the query contained lots of extra stuff that didn't do anything.
• -2 or more if the query was non-functional. In general, grading was harsher if the less elegant solution was attempted. For example, solutions that tried to incorrectly subtract the result of a natural join of two Enroll tables from one Enroll table received 2 points.
• #### Question 27:

No. The constraint could be violated by a deletion from Rating (namely, by removing the only Stanford program with a higher score than Berkeley). However, deletions are not monitored by a tuple-based CHECK.

• 2 points were given for the answer of "No" and basically for any reason that mentioned deletions.
• no points were given for the answer of "Yes".
• 1 point was given the answer of "No" but for a wrong reason; e.g.:
• a reason like only one tuple could be seen (but we could use subquery)
• some rational about why subqueries could cause problems  (they don't, because the only table it mentioned is its own table)

#### Question 28:

```CREATE ASSERTION BeatCal CHECK
(EXISTS (SELECT *
FROM   Rating r1, Rating r2
WHERE  r1.univ = 'Stanford'
AND    r2.univ = 'Berkeley'
AND    r1.prog = r2.prog
AND    r1.score > r2.score));```
• 5 points for a perfect solution.
• Around 4 points for forgetting Berkeley, forgetting to join on program, or having a minor error syntactic or semantic error, but showed understanding of how it worked.
• Somewhere in the vicinity of 1.5 or 2 points for a solution that did something to the effect that all Stanford programs are better than corresponding Berkeley ones, or something like there isn't a Berkeley program with a higher rating than all of Stanford's programs.
• #### Question 29:

```SELECT COUNT(*) + 1
FROM   Rating
WHERE  prog = 'Math'
AND    score > (SELECT score
FROM   Rating
WHERE  prog = 'Math'
AND    univ = 'Stanford');```
Or,
```SELECT COUNT(*) + 1
FROM   Rating r1, Rating r2
WHERE  r1.prog = 'Math'
AND    r2.prog = 'Math'
AND    r2.univ = 'Stanford'
AND    r1.score > r2.score;```
• 6 points for a perfect solution.
• 5.5 for forgetting either the +1, mixing up the > for >=, or having an unnecessary subquery SELECT ... FROM (SELECT * FROM ...).
• 5 for combining these errors, forgetting to check for Stanford, or forgetting to do a prog = 'Math' somewhere.
• Somewhere in the vicinity of 4 points for getting the right idea, but being confused about GROUP BY and having to get a solution that does not work.
• Around 1.5 points for trying to use the Oracle special feature of ROWNUM. Most of these solution did not always work, anyway. Recursive solutions also received nearly this amount of points (recursion is in SQL3, not SQL2).
• Around 0.5 or 1 point for something in SQL that showed you were trying to think about the problem.
• #### Question 30:

```WITH RECURSIVE Closure(attr) AS
(SELECT attr
FROM   AttrSet)
UNION
(SELECT rhs
FROM   Closure, FD
WHERE  attr = lhs)
SELECT * FROM Closure;```
• -1 no "query" part in the WITH statement.
• -1.5 adding a "NOT IN Closure" clause condition (not monotonic).
• -.5 large amounts of query redundancy.
• -.5 some larger syntax errors.
• -1.5 computing the FD closure and incorrectly querying it.
• -1 not using AttrSet for the base case (e.g. using lhs of FD).
• -1.5 (or more) writing a non-recursive query (e.g. joining FD with AttrSet only, instead of FD and Closure).
• #### Question 31:

```(SELECT k1.person1, k2.person2
FROM   Knows k1, Knows k2
WHERE  k1.person2 = k2.person1
AND    k1.person1 <> k2.person2)
EXCEPT
(SELECT *
FROM   Knows);```

Almost all of the scores fell into these categories:

• 5 perfect / almost perfect.
• 4.5 forgetting k1.p1 <> k2.p2.
• 4 returning distance equal to 3.
• 4 unsuccessful attempt to check the short circuit case (distance equal to 1).
• 3 no check for short circuit.
• 2.5 returning distance less than or equal to 3, without checking for short circuit.
• #### Question 32:

This question is tricky. First, here is a solution which is imperfect (but close enough):
```WITH RECURSIVE GatesDistance(person, distance) AS
(SELECT person2, 1
FROM   Knows
WHERE  person1 = 'Mr. Gates')
UNION
(SELECT *
FROM   GatesDistance)
UNION
(SELECT k.person2, gd.distance + 1
FROM   GatesDistance gd, Knows k
WHERE  gd.person = k.person1
AND    k.person2 NOT IN
(SELECT person FROM GatesDistance))
SELECT 'Mr. Gates', person, distance
FROM   Gates
WHERE  person <> 'Mr. Gates';```
The solution above is guaranteed to converge to a fixed point for any finite instance of Knows. The n-th iteration find all persons within distance n from Mr. Gates; the next iteration will add persons at a distance of n+1 from Mr. Gates. The condition k.person2 NOT IN (SELECT person FROM GatesDistance) guarantees the convergence of the fixed-point iteration when Knows contains cycles. This condition also ensures that the output distances are indeed minimum.

Strictly speaking, however, this recursion is not linear. What's worse, it's not even stratified since GatesDistance is not monotone w.r.t. itself, according to our definition of monotonicity. For example, suppose that Knows currently contains ('Mr. Gates', 'X'), ('Mr. Gates', 'Y'), and ('Y', 'Z'); GatesDistance currently contains ('X', 1) and ('Y', 1); at this point, the result of the query contains ('X', 1), ('Y', 1), and ('Z', 2). Now, suppose we add ('Z', 1) to GatesDistance. Then, the result of the query would contain ('X', 1), ('Y', 1), and ('Z', 1). The old tuple ('Z', 2) has been invalidated. On the other hand, this nonmonotonicity problem will never come up in the fixed-point iteration, because new tuples are added to GatesDistance in the breadth-first order.

Actually there is a linear and stratified solution, although it is less efficient than the solution above:

```WITH RECURSIVE GatesDistance(person, distance) AS
(SELECT person2, 1
FROM   Knows
WHERE  person1 = 'Mr. Gates')
UNION
(SELECT k.person2, gd.distance + 1
FROM   GatesDistance gd, Knows k
WHERE  gd.person = k.person1
AND    gd.distance <
(SELECT COUNT(*) - 1
FROM   ((SELECT person1 FROM Knows)
UNION
(SELECT person2 FROM Knows))))
SELECT person, MIN(distance)
FROM   GatesDistance
GROUP BY person
HAVING person <> 'Mr. Gates';```
Here, GatesDistance is monotone w.r.t. both itself and Knows. The condition gd.distance < (SELECT COUNT(*) ...) guarantees that the fixed-point iteration will terminate, after it has explored all paths of length m-1 from Mr. Gates, where m is the total number of persons in the database (any path of length m or longer will definitely contain a cycle).