CS145 Midterm Sample Solution
Question 17 is graded by Charity, Question 18 by Mike, Question 19 by
Mark, and Question 20 by Jun. If you want to discuss the grading of these
questions, you should give your exam to the relevant person, together with
a note explaining why you think the grading was questionable.
Question 1: (B)
If one X is related to a set of Y's and vice versa, then the relationship
between X and Y is manymany. If one X is related to one Y's but
one Y is related a set of X's, then the relationship between X and Y and
manyone.
If one X is related to one Y and vice versa, then the relationship between
X and Y is oneone.
Question 2: (D)
In a multiway relationship set, an arrow pointing into entity set X means
that if you pick one entity from each other entity set, together they determine
one entity in X. So, for this question, the FD's in R(A, B, C, D)
are ABC>D, ABD>C, and ACD>B.
In other words, the keys are {A, B, C}, {A, B, D}, and {A,
C, D}.
Question 3: (C)
When we translate a subclass in E/R to a relation, we include all attributes
attached to the subclass and its inherited key. In the case of ConferenceRooms,
we need to include capacity and the key of Rooms. Rooms is a weak entity
set, so its key is {name, number}, where name comes from Buildings via
the In relationship set.
Question 4: (C)
In the ODLstyle translation of subclasses, each object is represented
in only one relation and all information on the object goes into that relation.
Therefore, ConferenceRoom_ODL contains the n conference
rooms, while Room_ODL contains the rooms that are not conference
rooms (mn of them).
Question 5: (A)
I is true because every ConferenceRooms entity is also a Rooms entity.
II is false because there might be Buildings entities that are not related
to any Rooms entity (i.e., there might be buildings with no rooms).
Question 6: (D)
AB>C does not hold because of the first and the
third rows.
Question 7: (C)
Since B and E do not appear on the right side of any FD,
they must be a part of every key. Let's compute the closure of {B, E}.
First, apply B>D to get D into the closure.
Then, DE>A adds A into the closure. Finally,
AB>C
adds C into the closure. The closure of {B, E} covers all
attributes of S, so it's a superkey of S. But we also know
that B and E must be a part of every key, so {B, E}
is the only key.
Question 8: (B)
I is incorrect because BE>AC also holds in S2
(which should be rather obvious after Question 7 because {B, E}
is a key). II is true because AB does not contain {B, E},
the only key of S2. III is incorrect because S4 should be
(A,
B, E) instead of (C, E).
Question 9: (A)
I is true because DE>A is also a BCNF violation
for S. II is false: if we decompose using DE>A
we will get a relation with attributes (A, D, E), which we will
not get by decomposing with B>D first.
Question 10: (A)
Since A does not appear on the right side of any FD, it must be
in every key.
Question 11: (B)
Since every key contains A, it cannot contain C. If it also
contains C, then it would not be minimal: we can safely take C
out and still have a key because C is determined by A (according
to A>BC).
Question 12: (A)
This one requires more than average thought. If some key contains E,
then the third FD must be nontrivial and its left side must contain E.
In this case, {A, E} is a key, and it easy to see that {A, D}
is a key too. Thus, I is true. II is false because the third FD could be
trivial. In this case, the only key is {A, D}.
Question 13: (C)
Yes, III is correct.
Question 14: (D)
{A} may no longer be a key. Consider the following example:
R:
A 
B 
apple 
sweet 
lemon 
sour 


T:
A 
B 
apple 
sweet 
lemon 
sour 
apple 
sour 

Question 15: (D)
I is incorrect: if somebody works for Microsoft and owns both Microsoft
and Yahoo! stocks, he will be returned by the query. II and III are correct.
Question 16: (A)
I is obviously correct. However, II is not, because the join could duplicate
salaries and result in an incorrect average. Suppose that X's salary is
$30000 and Y's salary is $60000. X owns 200 shares of Microsoft. Y owns
200 shares of Microsoft and 200 shares of Yahoo!. When FROM and
WHERE
clauses are done, we end up with one tuple about X but two tuples about
Y. Then, when we take the average we get ($30000+$60000+$60000)/3 = $50000.
But the correct answer should be ($30000+$60000)/2 = $45000.
Question 17:
PROJ_{SSN} ( SEL_{employerSymbol='IFMX'} (Person)
NATURALJOIN
SEL_{symbol='ORCL' AND numShares>50} (Holding) )
Grading by Charity:
.5 for >= instead of >
.5 for selecting SSN instead of Person.SSN
or Holding.SSN when the two relation are crossed or thetajoined
1 for writing SQL queries  if the query works
2 for neglecting a condition (for example, not selecting
for ORCL stocks)
Question 18:
SELECT SUM(numShares)
FROM Person, Holding
WHERE employerSymbol = 'IFMX'
AND Person.SSN = Holding.SSN
AND symbol = 'ORCL';
Alternate solution:
SELECT SUM(numShares)
FROM Holding
WHERE symbol = 'ORCL' AND
SSN IN (SELECT SSN
FROM Person
WHERE employerSymbol = 'IFMX');
Grading by Mike:
2: Forgetting to join on SSN (Person.SSN =
Holding.SSN)
1: Forgetting a clause in the WHERE (e.g. IFMX
or ORCL)
1: Wrong aggregate (e.g. count)
0: Small / Inconsequential Syntax Error (e.g. quotes)
.5: Syntax Error (e.g. "=" instead of "IN")
.5: Overly complicated query (e.g. "GROUP BY" or
multiple subselects)
Some of the submissions had more substantial logic or syntax
errors and the grading scheme above just didn't apply. For those
people, I assigned a grade, somewhat arbitrarily (probably between 0 and
3.5) based on how much of the "idea" of the question that they captured.
Question 19:
PROJ_{symbol} (StockPrice)

PROJ_{s1.symbol} (
RENAME_{s1}(StockPrice)
THETAJOIN_{s1.symbol=s2.symbol AND
s1.date<s2.date AND s1.price>=s2.price}
RENAME_{s2}(StockPrice)
)
Grading by Mark: Generally, a common mistake people
made was to find all stocks who price went up at some point:
/* WRONG! */
PROJ_{s1.symbol} (
RENAME_{s1}(StockPrice)
THETAJOIN_{s1.symbol=s2.symbol AND
s1.date>s2.date AND s1.price>s2.price}
RENAME_{s2}(StockPrice)
)
A correctly written query doing this received 2 points. 1.5
points was given for this without the s1.symbol=s2.symbol condition.
A 1 point answer was almost any relational algebra query. You got 2 points
if you wrote a SQL query that is right, but is not easy to translate to
relational algebra (e.g., use ANY, EXISTS, etc.). 1 point
for a SQL query making the same mistake as above.
If you got near the correct answer (did not make the mistake
above), you got 4.5 points if you reported all nonfalling stocks rather
than all rising stocks (e.g. < instead of <=).
You got 4 points if you forgot the condition needed to join on the symbols.
(Remember, this is necessary for thetajoin; it is only unnecessary if
you used a natural join and made sure the column names for symbol were
the same.) A 3 or 3.5 point answer had the right idea, but made more
major mistakes.
Question 20:
SELECT symbol, price
FROM StockPrice s
WHERE symbol IN
(SELECT symbol
FROM Holding
GROUP BY symbol
HAVING COUNT(*) >
(SELECT 0.4*COUNT(DISTINCT SSN) FROM Holding))
AND date >= ALL
(SELECT date
FROM StockPrice
WHERE symbol = s.symbol);
Grading by Jun:
+2 for making a commendable attempt at the problem and correctly
showing the overall idea and structure of the query
+2 for correctly identifying all "widelyheld" stocks
+2 for correctly identifying the latest price for each stock
Common mistakes include:
Applying an aggregate function to a subquery (e.g., COUNT(SELECT ...))
Aggregate functions in SQL should be applied to attributes!
Applying quantifiers to attributes (e.g., >= ANY date)
Quantifiers should be applied to subqueries!
HAVING clause does not have a condition (e.g., HAVING MAX(DATE))
Jun Yang 
CS145 Spring 1999 