### Assignment #2 Due Wednesday April 25

#### LOGISTICS

• Warning: several of the written exercises in this assignment do take some thought. We do not suggest leaving this assignment for the last minute.

• The procedure for turning in this assignment is the same as for the first assignment. Please remember the strictly enforced late policy.

• Please save a copy of your final relational schema for your PDA (Problem 7 part (g)), as it will be needed for subsequent project parts.

• As with the first assignment, please turn in your PDA (Problem 7) separately as it will be graded by a different TA from the written exercises.

#### Problem 1

Consider the following rule for functional dependencies:
```   IF A -> BB, CC -> D, and CC is a subset of BB
THEN A -> D
```
where A and D are single attributes and BB and CC are sets of attributes. For simplicity you may assume that A and D are not in BB or CC. Prove the correctness of this rule. Your proof should not be based on closure or on other rules for functional dependencies, but rather it should be based on the formal definition of functional dependencies:
```   AA -> BB =
for all tuples t and u, if t[AA]=u[AA] then t[BB]=u[BB]
```
where AA and BB are sets of attributes.

Your proof might have roughly the following form: Suppose A -> BB holds, CC -> D holds, and CC is a subset of BB. Then for all tuples t and u, ... [fill in] ... To prove that A -> D holds, we need to prove that for all tuples t and u, ... [fill in]. Therefore A -> D holds.

#### Problem 2

Consider the following rule for multivalued dependencies:
```  IF AA ->> BB and AA ->> CC
THEN AA ->> (BB intersect CC)
```
where AA, BB, and CC are sets of attributes, and intersect performs set intersection. For simplicity you may assume that AA does not intersect BB or CC, and that there are no attributes in the relation besides those in AA, BB, and CC. As in Problem 1, your proof should be based on the formal definition of (in this case) multivalued dependencies, and not on other rules for multivalued dependencies.

Your proof might have roughly the following form: Suppose AA ->> BB and AA ->> CC hold. Then for all tuples t and u there exists a tuple v such that ... [fill in] ... Let DD = BB intersect CC. To prove that AA ->> DD holds, we need to prove that for all tuples t and u there exists a tuple v such that ... [fill in]. Therefore AA ->> DD holds.

#### Problem 3

(a) The following "rule" does not hold for functional dependencies:
```  IF  A,B -> C  THEN  A -> C
```
where A, B, and C are single attributes. Show that this "rule" does not hold by providing the simplest counterexample you can find. To provide a counterexample, you need to specify a relation schema and a relation instance (set of tuples) for which A,B -> C holds but A -> C does not hold.

(b) The same "rule" does not hold for multivalued dependencies either:

```  IF  A,B ->> C  THEN  A ->> C
```
where A, B, and C are single attributes. Show that this "rule" does not hold by providing the simplest counterexample you can find. To provide a counterexample, you need to specify a relation schema and a relation instance (set of tuples) for which A,B ->> C holds but A ->> C does not hold.

#### Problem 4

Consider a relation R(A,B) where attributes A and B contain values that are bit strings. The instance of R with four tuples shown below satisfies the functional dependency A -> B and all functional dependencies that follow from A -> B, but it does not satisfy any other functional dependencies (in particular, it does not satisfy B -> A):

AB
000
010
101
111

This example serves as a hint for parts (a) and (b) that follow.

(a) Consider a relation R(A,B,C) where all attributes contain values that are bit strings. Find an instance of R that satisfies the functional dependencies A -> B, B -> C, and all functional dependencies that follow from these two, but it does not satisfy any other functional dependencies (such as C -> A, BC -> A, etc.).

(b) Consider a relation R(A,B,C,D) where all attributes contain values that are bit strings. Find an instance of R that satisfies the functional dependencies A -> B, B -> C, C -> D, and all functional dependencies that follow from these three, but it does not satisfy any other functional dependencies (such as C -> A, D -> B, CD -> A, etc.).

#### Problem 5

Consider the following relational schema:
```  Sale(clerk, store, city, date, item#, size, color)
// records that a clerk sold an item on a particular day

Item(item#, size, color, price)
// records prices and available sizes and colors for items
```
Make the following assumptions, and only these assumptions, about the real world being modeled:
• Each clerk works in one store.
• Each store is in one city.
• A given item# always has the same price, regardless of size or color.
• Each item is available in one or more sizes and one or more colors, and each item is available in all combinations of sizes and colors for that item.
Here are the problems:

(a) Based on the assumptions above (and no others), specify an appropriate set of completely nontrivial functional dependencies for relations Sale and Item.

(b) Based on the assumptions above (and no others), specify all keys for relations Sale and Item.

(c) Are relations Sale and Item in Boyce-Codd Normal Form (BCNF)? If not, decompose the relations so they are in BCNF.

(d) Now consider your decomposed relations from part (c), or the original relations if you did not need to decompose them for part (c). Based on the assumptions above (and no others), specify all nontrivial multivalued dependencies for the relations. Do not include multivalued dependencies that also are functional dependencies.

(e) Are the relations you used in part (d) in Fourth Normal Form (4NF)? If not, decompose the relations so they are in 4NF.

#### Problem 6

Consider the Boyce-Codd Normal Form (BCNF) decomposition algorithm specified in class and discussed in the textbook. We start with a relation R and a set F of functional dependencies (FDs) for R. We decompose R into R1 and R2 based on a dependency in F that violates BCNF, and then we compute the sets of FDs F1 for R1 and F2 for R2 so we can continue with the algorithm. Suppose that when we compute the FDs for R1 and R2, instead of finding all FDs that follow from our original set F and that apply to R1 and R2 (i.e., that include only attributes from R1 or R2), we simply use the original set F of FDs specified for R and take those that apply to R1 and R2. Can we get a decomposition that is not in BCNF? If not, argue why the algorithms still works. If so, illustrate the problem: give a relation schema R and a set of FDs for R, and trace how the BCNF decomposition algorithm produces a final set of relations that is not in BCNF.

#### Problem 7 (Personal Database Application, Part 2)

In this second part of the project, you will produce a relational schema from the entity-relationship diagram you came up with in PDA Part 1. In next week's project part, you will create an actual database from this schema using the Oracle DBMS.

(a) Please attach a copy of your E/R diagram from PDA Part 1. If you would like to make changes to your original E/R diagram at this point (due to staff feedback or any other reason), you may do so. The new E/R design will not be graded but will be used as a basis for grading part (b).

(b) Using the method for translating an E/R diagram to relations, produce a set of relations for your database design. As usual, please be sure to underline key attributes in your relations.

(c) For each relation in your schema, specify a set of completely nontrivial functional dependencies for the relation. Any functional dependencies that actually hold in the real-world scenario that you're modeling should be specified, or should follow from the specified dependencies. Don't worry if you find that some of your relations have no nontrivial functional dependencies.

(d) Is each relation in your schema in Boyce-Codd Normal Form (BCNF) with respect to the functional dependencies you specified? If not, decompose the relation into smaller relations so that each relation is in BCNF. Be sure to underline key attributes in your new relations. Don't worry if you don't have any BCNF violations - many PDAs will not have any.

(e) Are there any nontrivial multivalued dependencies that hold on any of the relations in your schema? (You needn't consider MVD's that are also functional dependencies.) If so, specify the multivalued dependencies, then decompose the relations into smaller ones so that each one is in Fourth Normal Form (4NF). Be sure to underline key attributes in your new relations. Don't worry if you don't have any 4NF violations - most PDAs will not have any.

(f) Now that you've decomposed your relations as far as possible, are there any relations that could be combined without introducing redundancy (i.e., without creating BCNF or 4NF violations)? If so, combine them.

(g) Is there anything you still don't like about the schema (e.g., attribute names, relation structure, etc.)? If so, modify the relational schema to something you prefer. You will be working with this schema quite a bit, so it's worth spending some time now to make sure you're happy with it. And once again:

• Please save a copy of your final relational schema (part (g)), as it will be needed for subsequent project parts.
• Please turn in your PDA separately as it will be graded by a different TA from the written exercises.