A First Course in Database Systems

## Solutions for Chapter 4

Solutions for Section 4.1

## Solutions for Section 4.1

### Exercise 4.1.1(a)

(Revised 10/31/98)

PI_model( SIGMA_{speed >= 150} ) (PC)

### Exercise 4.1.1(f)

The trick is to theta-join PC with itself on the condition that the hard disk sizes are equal. That gives us tuples that have two PC model numbers with the same value of hd. However, these two PC's could in fact be the same, so we must also require in the theta-join that the model numbers be unequal. Finally, we want the hard disk sizes, so we project onto hd.

The expression is easiest to see if we write it using some temporary values. We start by renaming PC twice so we can talk about two occurrences of the same attributes.

R1 = RHO_{PC1} (PC)
R2 = RHO_{PC2} (PC)
R3 = R1 JOIN_{PC1.hd = PC2.hd AND PC1.model <> PC2.model} R2
R4 = PI_{PC1.hd} (R3)

### Exercise 4.1.1(h)

First, we find R1, the model-speed pairs from both PC and Laptop. Then, we find from R1 those computers that are ``fast,'' at least 133Mh. At the same time, we join R1 with Product to connect model numbers to their manufacturers and we project out the speed to get R2. Then we join R2 with itself (after renaming) to find pairs of different models by the same maker. Finally, we get our answer, R5, by projecting onto one of the maker attributes. A sequence of steps giving the desired expression is:
R1 = PI_{model,speed} (PC) UNION PI_{model,speed} (Laptop)
R2 = PI_{maker,model} (SIGMA_{speed>=133} (R1) JOIN Product)
R3 = RHO_{T(maker2, model2)} (R2)
R4 = R2 JOIN_{maker = maker2 AND model <> model2} (R3)
R5 = PI_{maker} (R4)

### Exercise 4.1.2

Here are postscript figures for the expression trees of Exercise 4.1.1 Part (a) Part (f) Part (h). Note that the third figure is not really a tree, since it uses a common subexpression. We could duplicate the nodes to make it a tree, but using common subexpressions is a valuable form of query optimization. One of the benefits one gets from constructing ``trees'' for queries is the ability to combine nodes that represent common subexpressions.

### Exercise 4.1.5

The relation that results from the natural join has only one attribute from each pair of equated attributes. The theta-join has attributes for both, and their columns are identical.

### Exercise 4.1.7(a)

If all the tuples of R and S are different, then the union has n+m tuples, and this number is the maximum possible.

The minimum number of tuples that can appear in the result occurs if every tuple of one relation also appears in the other. Surely the union has at least as many tuples as the larger of R and that is, max(n,m) tuples. However, it is possible for every tuple of the smaller to appear in the other, so it is possible that there are as few as max(n,m) tuples in the union.

## Solutions for Section 4.2

### Exercise 4.2.1(a)

Answer(model) <- PC(model,speed,_,_,_,_) AND speed >= 150

### Exercise 4.2.1(f)

Answer(hd) <- PC(m1,_,_,hd,_,_) AND PC(m2,_,_,hd,_,_) AND m1 <> m2
Notice how Datalog allows us to express equality conditions by using the same variable, e.g., hd, twice.

### Exercise 4.2.1(h)

FastComputers(M) <- PC(M,S,_,_,_,_) AND S >= 133
FastComputers(M) <- Laptop(M,S,_,_,_,_) AND S >= 133

Answer(maker) <- Product(maker,m1,_) AND Product(maker,m2,_) AND
FastComputers(m1) AND FastComputers(m2) AND m1 <> m2

## Solutions for Section 4.3

### Exercise 4.3.1(d)

The following uses X as a temporary predicate to represent R UNION S.

X(a,b,c) <- R(a,b,c)
X(a,b,c) <- S(a,b,c)

Answer(a,b,c) <- X(a,b,c) AND NOT T(a,b,c)

### Exercise 4.3.1(g)

This expression asks for those (a,b) pairs that are both the first two components of a tuple from R and the last two components of a tuple from S. Thus, the following rule works.

A more mechanical solution would be something like:

X(a,b) <- R(a,b,_)
Y(a,b) <- S(_,a,b)

### Exercise 4.3.2(b)

Answer(x,y,z) <- R(x,y,z) AND x < y AND y < z

### Exercise 4.3.2(e)

We need to work on this using DeMorgan's laws to put the expression into conjunctive normal form. First, we push the NOT through the AND, which ``flips'' the AND to an OR, giving us:

(NOT(xy)) OR (NOT(y

The second NOT flips < to >=.
The first NOT is pushed through the first OR, flipping it to AND:

((NOT(xy))) OR y>=z
Finally, push the remaining NOT's into the literals:

(x>=y AND y>=x) OR y>=z
In this case, we can observ esomething special about the left side of the OR. The only way for x>=y and y>=x both to hold is if x=y. Thus, our final form is:

x=y OR y>=z
This condition can be expressed by two rules:

Answer(x,y,z) <- R(x,y,z) AND y >= z
Notice that the first rule is a more succinct way of writing

Answer(x,y,z) <- R(x,y,z) AND x = y

### Exercise 4.3.4(b)

Answer(rx,ry,rz,sx,sy,sz) <- R(rx,ry,rz) AND S(sx,sy,sz) AND
rx < sy AND ry < sz

### Exercise 4.3.4(e)

Taking advantage of the expression simplification we did for Exercise 4.3.2(e):

Product(rx,ry,rz,sx,sy,sz) <- R(rx,ry,rz) AND S(sx,sy,sz)

Answer(rx,ry,rz,sx,sy,sz) <- Product(rx,ry,rz,sx,sy,sz) AND ry >= sz
Notice that rx appears four times in the first rule above, thus enforcing rx = sy.

### Exercise 4.3.5(a)

That looks to us like the natural join of Q and R, followed by projecting out the middle (z) component.

PI_{x,y} (Q JOIN R)

## Solutions for Section 4.4

### Exercise 4.4.1(a)

The following pairs are added to Reaches: (CHI,SF), (CHI,DEN), (CHI,DAL), (CHI,CHI), (DEN,DEN), (DEN,SF), (DAL,DEN), (DAL,SF), (DAL,DAL), and (SF,SF).

The following tuples are added to Connects: (DAL,SF,1530,2100), (DEN,SF,1500,2100), (SF,SF,900,2100), and (SF,SF,930,2100).

To AAreaches we add: (CHI,SF), (CHI,DAL), (CHI,CHI), (DAL,SF), (DAL,DAL), and (SF,SF).

### Exercise 4.4.2(a)

We could define FollowOn as in Example 4.36 and then simply say:

P(x,y) <- FollowOn(x,y) AND NOT SequelOf(x,y)
Another approach is to define P(x,y) recursively, starting with two levels of sequel as a basis. This recursion is:

P(x,y) <- SequelOf(x,z) AND SequelOf(z,y)
P(x,y) <- SequelOf(x,z) AND P(z,y)

### Exercise 4.4.3(b)

1) S(class,eclass) <- Rel(class,eclass,"single")
2) M(class,eclass) <- Rel(class,eclass,"multi")

3) S(x,y) <- S(x,z) AND S(z,y)
4) M(x,y) <- M(x,z) AND M(z,y)
5) M(x,y) <- S(x,z) AND M(z,y)
6) M(x,y) <- M(x,z) AND S(z,y)
In explanation, rules (1) and (2) are basis rules, that handle ``paths'' of length 1. Rule (3) is the recursion for S: paths that are all single must be composed of two paths that are all single. Rules (4), (5), and (6) are the recursion for M: a path with a "multi" may be composed of two paths, at least one of which has a "multi."

## Solutions for Section 4.5

### Exercise 4.5.1(a)

SIGMA_{speed<150 AND price>1500} (PC) = EMPTYSET

### Exercise 4.5.1(d)

This complex expression is best seen as a sequence of steps in which we define temporary relations R1 through R4 that stand for nodes of expression trees. Here is the sequence:
R1(maker, model, speed) := PROJ_{maker,model,speed}(Product JOIN PC)
R2(maker, speed) := PROJ_{maker,speed}(Product JOIN Laptop)
R3(model) := PROJ_{model}(R1 JOIN_{R1.maker=R2.maker AND R1.speed<=R2.speed} R2)
R4(model) := PROJ_{model}(PC)
The constraint is that R4 is a subset of R3.

In explanation, R1 is the set of maker-model-speed triples for PC's, and R2 is the set of maker-speed pairs for laptops. Note JOIN is the natural join. R3 is the set of models (of PC's) for which there is a laptop by the same manufacturer at least as fast. R4 is all the PC models. Note that R3 is surely a subset of R4, so the constraint that R4 be a subset of R3 is really asking that these sets be equal; i.e., every PC has a laptop by the same manufacturer that is at least as fast.

### Exercise 4.5.3(a)

Producers(c) <- Movie(_,_,_,_,_,c)
Execs(c) <- MovieExec(_,_,c,_)
Problem(c) <- Producers(c) AND NOT Execs(c)
Here, Producers(c) is true if c is the certificate of a producer, and Execs(c) is true if c is the certificate of a movie executive. We have a problem if there is a value of c in the former set but not in the latter. Thus, we constrain Problem to be an empty relation.

## Solutions for Section 4.6

### Exercise 4.6.1

As a bag, the value is {133, 120, 166, 166, 166, 200, 200, 180, 200, 160}. Order is unimportant, of course. The average is 169.1.

As a set, the value is {133, 120, 166, 200, 180, 160}. The average is 159.8.

### Exercise 4.6.4(a)

As sets, an element x is in the left-side expression

(R UNION S) UNION T

if and only if it is in at least one of R, S, and T. Likewise, it is in the right-side expression

R UNION (S UNION T)

under exactly the same conditions. Thus, the two expressions have exactly the same members, and the sets are equal.

As bags, an element x is in the left-side expression as many times as the sum of the number of times it is in R, S, and T. The same holds for the right side. Thus, as bags the expressions also have the same value.

### Exercise 4.6.4(h)

As sets, element x is in the left side

R UNION (S INTERSECT T)

if and only if x is either in R or in both S and T. Element x is in the right side

(R UNION S) INTERSECT (R UNION T)

if and only if it is in both R UNION S and R UNION T. If x is in R, then it is in both unions. If x is in both S and T, then it is in both union. However, if x is neither in R nor in both of S and T, then it cannot be in both unions. For example, suppose x is not in R and not in S. Then x is not in R UNION S. Thus, the statement of when x is in the right side is exactly the same as when it is in the left side: x is either in R or in both of S and T.

Now, consider the expression for bags. Element x is in the left side the sum of the number of times it is in R plus the smaller of the number of times x is in S and the number of times x is in T. Likewise, the number of times x is in the right side is the smaller of

1. The sum of the number of times x is in R and in S.
2. The sum of the number of times x is in R and in T.
A moment's reflection tells us that this minimum is the sum of the number of times x is in R plus the smaller of the number of times x is in S and in T, exactly as for the left side.

### Exercise 4.6.5(a)

For sets, we observe that element x is in the left side

(R INTERSECT S) - T

if and only if it is in both R and S and not in T. The same holds for the right side

R INTERSECT (S-T)

so as sets, the equivalence holds.

Now suppose we have bags. Let R = {x}, S = {x,x}, and T = {x}. Then R INTERSECT S = {x}, and the left side is {x}-{x}, or EMPTYSET.

However, on the right, S-T = {x}, so the right side is {x} INTERSECT {x}, which is {x}.