Database Systems: The Complete Book Solutions for Chapter 19

## Solutions for Section 19.1

### Exercise 19.1.1

The difference between strict and nonstrict locking for this example is only in the matter of whether the lock on A is released prior to the write of B. That is, the lock on A must be taken before r1(A). The lock on B can occur in any of the three positions prior to r1(B) (3 choices).

If locking is strict, then the two unlocks must occur, in either order, after the second write action. Thus, there are 6 orders for strict, 2-phase locking.

If locking is not strict, the unlock of B must occur after w1(B), but then the unlock of A can occur in any of the three positions after w1(A). Thus, there are 9 orders if strictness is not required, and 3 of these must be nonstrict.

### Exercise 19.1.2(a)

T1 wrote only B, but that value was later read by T3. Thus, T3 must be rolled back. T3 wrote D, but no transaction has read D, so no further rollbacks are needed.

### Exercise 19.1.4(a)

Revised 5/3/04.

Each transaction reads only one element, so the question is, does it read from the other? If so, then in order for the schedule to be recoverable, it must commit after the other as well. That is, the two constraints on the schedule can be expressed:

1. If r1(C) follows w2(C) then c1 must follow c2.
2. If r2(B) follows w1(B) then c2 must follow c1.

The partial orders that violate (1) look like:

```o->o->o->o
|      ^
v      |
o->o->o->o
```

With a little effort, we can find that there are 12 orders for the eight nodes involved.

The orders that violate (2) are similar; they have the constraints:

```o->o->o->o
|   ^
v   |
o->o->o->o
```

There are 10 of these. Since there are {8 choose 4} = 70 orders in all, 48 of them are recoverable.

## Solutions for Section 19.2

### Exercise 19.2.1(a)

```                _____________________
|                   |
A    |                   v    B
T_0 ----> T1       T2 ----> T3 ----> Tf
| |          A       ^         ^
| |__________________|         |
|              A               |
|______________________________|
```

Above is the polygraph. We have labeled with A the arcs due to the reading of A, and likewise for B. The two unlabeled arcs are second-phase requirements that the other writers of B --- T1 and T2 --- not interfere with the reading of B. by the final ``transaction'' Tf.

Clearly, T3 must come last in an view-serializable schedule, but there is no required ordering of T1 and T2. Notice that conflict-serializability would require T1 to precede T2, because their writes of B cannot swap places.

### Exercise 19.2.2(a)

For conflict-equivalent schedules, the three writes must remain in order, but the three reads may be moved anywhere, as long as they precede the write by the same transaction.

Start with the three writes. We can only place r1(A) prior to w1(B). Next, we can place r2(A) in any of the three positions prior to w2(B). At the last step, there are now five places where we can put r3(A) prior to w3(B). Thus, there are 15 conflict-equivalent schedules.

For view-serializable orders, there are the above 15, and another 15 in which w1(B) is placed after w2(B). Intuitively, we can do so because neither T1 nor T2 have their value of B read, so the order in which they write doesn't matter.

## Solutions for Section 19.3

### Exercise 19.3.1(a)

All locks are granted until w3(B). Now, T3 has to wait for T2, which has a shared lock on B, and the waits-for graph is
```     T3 ----> T2
```

At the next step, T2 must wait for T1, which has a lock on C. The waits-for graph is:

```     T3 ----> T2 ----> T1
```

Next, w4(A) causes T4 to wait for T1. There is still no cycle, and the waits-for graph is:

```     T3 ----> T2 ----> T1 <---- T4
```

Finally, T1 tries to write D. It cannot get the exclusive lock on D that it needs, because T3 holds a shared lock on D. If we allowed T1 to wait, the graph would be:

```     T3 ----> T2 ----> T1 <---- T4
^                   |
|___________________|
```

Thus, T1 must abort and relinquish its locks. At that time, T2 and T4 can each get the locks they need. When T2 finishes, T3 can proceed.

### Exercise 19.3.7

Suppose T1 is a transaction that tries to lock elements A and B in that order. There are also an indefinitely long sequence of transactions T2, T3,... that lock B and then A.

Initially, T1 locks A and T2 locks B and then requests a lock on A. T2 therefore waits for T1. Meanwhile, T3 starts and requests a lock on B. Thus, T3 waits for T2, and the waits-for graph is:

```     T3 ----> T2 ----> T1
```

When T1 requests a lock on A, it would cause a cycle, so T1 is rolled back. Now, T2 completes and releases its locks. The lock on A is given to T1 and the lock on B is given to T3. Next, T3 requests a lock on A, but has to wait for T1. Also, T4 starts up and requests a lock on B, thus being forced to wait for T3. The waits-for graph is now:

```     T4 ----> T3 ----> T1
```

The above graph is just like the first, except T4 and T3 have replaced T3 and T2, respectively. As long as there is a supply of transactions like T2, T3, and T4, they can prevent T1 from finishing ever.

Notice that the transactions in the example above request locks on A and B in different orders. Perhaps forcing transactions to request locks in a fixed order will solve the problem. Indeed, if we follow the policy that when a lock becomes available, it is given to a waiting transaction on a first-come-first-served basis, then any transaction that is waiting for a lock will eventually become the transaction that has been waiting for the lock the longest. At that point, the next time the lock becomes available, it will be granted to that transaction, which thus makes some progress. The above observation is the germ of an inductive proof that requesting locks in order plus first-come-first-served will allow every transaction eventually to complete.

However, if the scheduler is able to give locks to any transaction it wishes, then it is easy to make a transaction starve. Suppose T1 wants a lock on A when some other transaction has that lock. As long as there is always another transaction besides T1 waiting for a lock on A, and the scheduler always chooses to give the lock to some transaction other than T1 when the lock becomes available, T1 never makes any progress.

Last, let us consider deadlock prevention by timeout. If the duration before a timeout can vary even slightly, then the example given initially for deadlock prevention by avoiding cycles will work. That is, even though T2 has been waiting slightly longer for a lock than T1, if we allow that T1 might timeout first, then the sequence of events described above could also occur with timeout-based deadlock prevention.

If timeouts occur for any transaction at exactly t seconds from when it first started to wait, then we need a simple modification of the above example. Assume that T1 is faster at requesting its second lock than any of the other transactions. Then, after T1 gets a lock on A and T2 gets a lock on B, T1 next requests its lock on B and starts to wait. Slightly later, T2 requests a lock on A. T3 requests its lock on B just before T1 times out, so when that timeout occurs, T2 completes, gives its A-lock to T1 and its B-lock to T3. When T2 next requests a lock on B and starts to wait, this cycle can repeat with T3 and a new T4 in place of T2 and T3.

## Solutions for Section 19.4

### Exercise 19.4.1

Define U to be the sum over all sites i of 10dui. That is, U is the cost of sending one second's worth of updates to one site other than the site at which the update originated.

Now, consider the value of creating a copy of R at site i. On the positive side, the queries generated at site i that used to have to be answered remotely, at a cost of 10cqi, can now be answered for cqi, a positive benefit of 9cqi.

However, the additional copy of R has to be updated. The cost is dui for the updates generated at site i and U - 10dui for the updates generated at all other sites. Recall that U represents the cost of a remote update for all generated updates, so if we subtract 10dui we get just the cost of the updates that do not originate at i.

In order for the benefit of creating a copy at i to outweigh the additional update cost, we must have

9cqi + 9dui >= U

Thus, the optimal selection of sites at which to maintain copies of R is all sites for which the above inequality holds. Roughly, we favor those sites at which there is a lot of activity, but the larger U is (i.e., the greater the update rate), the less likely we are to want to create many copies.

There is one special case: what if the inequality fails for all i; that is, it is not cost effective to create any copies. As long as all the qi's are zero, that's OK. The proverbial ``write-only database'' needs no copies! However, in realistic cases with a heavy update rate, we shall be forced to create one copy. The correct choice is whichever i has the largest value of cqi + dui.

## Solutions for Section 19.5

### Exercise 19.5.1(a)

There is a component, call it A at the home computer, and components at each of the banks --- components B and C. Component B receives the directive from A. B checks that there is adequate money in the account, and aborts if not. Otherwise, B subtracts \$10,000 from the account at B and signals C to deposit \$10,000 into the designated account at C. Component C checks that the account exists, and aborts if not. Otherwise, C adds \$10,000 to the account and informs A of a successful conclusion.

### Exercise 19.5.2(a)

(0,1,P), (0,2,P), (1,0,R), (2,0,D), (0,1,A), (0,2,A)

### Exercise 19.5.2(b)

In the first phase, the coordinator exchanges the messages (0,1,P), (1,0,R) with site 1 and the messages (0,2,P), (2,0,R) with site 2. These messages may occur in any of (4 choose 2) = 6 possible interleavings. That is, there are four positions in which these messages occur, and we can pick any two of them to devote to the messages involving site 1.

In the second phase, the coordinator can send commit messages to the other two sites in either order. There are thus 6 * 2 = 12 possible message sequences.

## Solutions for Section 19.6

### Exercise 19.6.1(a)

Suppose that s, x, and i are the numbers of local shared, exclusive, and intcrement locks that a transaction needs to have a global lock of that type. Since no two transactions can hold an exclusive lock on the same element, we need:

2x > n

just as for the shared + increment system discussed in the section. We also cannot allow a shared and exclusive lock at the same time, which tells us:

x + s > n

Likewise, we cannot have an exclusive and increment lock at the same time, so:

x + i > n

Finally, we cannot have a shared and increment lock at the same time, so:

s + i > n

For example, we could require that any type of lock requires a majority. We cannot allow both s and i to be n/2 or less, because of the last inequality. But we could let one of s and i be 1, while the other, and x, are both n.

### Exercise 19.6.2(a)

The 90% of the accesses that are read-only require no messages, since there is a lock table and a copy at each site. The remaining 10% of the accesses require exclusive locks. Thus, each requires three messages between the originating site and each of the four other sites, a total of 12 messages. The average number of messages is 1.2.

## Solutions for Section 19.7

### Exercise 19.7.1

The task of the compensating transaction is first to determine whether the file f is still the present one with that name, or whether it has been replaced with a later version.
1. If f is still in place, then the compensating transaction must
1. Restore f' if the latter existed, or
2. Delete f if not.

2. If f has been overwritten, then there is no need to change the file with that name.

To see that this compensation works, we can consider all the possible cases above, and check that the file system, after compensation, is the same as would be the case had f never been written. If f was overwritten before compensation, then we surely leave the file system in the same state as if f were never written. If f is not overwritten, then we have correctly left the file with that name being either f', if it existed, or left the file system with no file of that name.

There are two interesting issues. First, in order for there to be a compensating transaction, it seems we must both remember the time at which the installation occurred (so we can tell whether the file f has been overwritten when the uninstallation occurs), and we must also remember the file f', at least until such time as the file is overwritten, and the new value is itself guaranteed never to be uninstalled (i.e., the saga that wrote it has finished). These features are not commonly available.

Second, note that file systems do not support serializability. Thus, it is entirely possible that the uninstalled file f has been read, and its contents have influenced other files or the behavior of the system.