CS145 Lecture Notes (9) -- Transactions

Transactions are motivated by two of the properties of DBMS's discussed way back in our first lecture:

Multi-user database access
Safe from system crashes

Multi-user database access

Most database systems run as servers where either:

multiple clients are simultaneously operating on the same database, or
one or more middle-tier application servers are maintaining multiple concurrent connections to the database

(pictures)

Problems created by concurrency => need concurrency control

Example 1: Attribute-level inconsistency

  Client 1 - UPDATE Student 
             SET address = CONCAT(address,:zip) WHERE ID = 123

  Client 2 - UPDATE Student
             SET address = CONCAT(address,:phone) WHERE ID = 123

For each client, DBMS reads address value, updates it, and writes it back. Possible outcomes without concurrency control: one change or both.

Example 2: Relation-level inconsistency

  Client 1 - UPDATE Apply SET decision = 'Y'
             WHERE location = 'SB'
             AND ID IN (SELECT ID FROM Student WHERE GPA > 3.2)

  Client 2 - UPDATE Student
             SET GPA = 1.2 * GPA
             WHERE HSname = 'Paly'

Possible outcomes without concurrency control: some Paly students get into SB on scaled GPA, others don't.

Example 3: Multiple-statement level inconsistency

  Client 1 - INSERT INTO Archive (SELECT * FROM Apply WHERE decision = 'N')
             DELETE FROM Apply WHERE decision = 'N'

  Client 2 - SELECT COUNT(*) FROM Apply
             SELECT COUNT(*) FROM Archive

Overall goal:

Want to be able to execute a sequence of SQL statements so they at least appear to be running in isolation.
But also want to enable concurrency whenever possible.

Question: Why not just execute everything in sequence? What system, database, or application features give us inherent concurrency that we want to exploit?

Safe from system crashes

Bulk-loading the database, system crashes in the middle - what now?

Example 3 Client 1:

  INSERT INTO Archive (SELECT * FROM Apply WHERE decision = 'N')
  DELETE FROM Apply WHERE decision = 'N'

System crashes in the middle - what now?

Performed lots of update operations and DBMS is buffering the database in memory for efficiency. System crashes - what now?

Need crash recovery

Solution: Transactions

A transaction is a sequence of one or more SQL operations treated as a unit.

Transactions appear to run in isolation.
If the system crashes, each transaction's changes are reflected in the persistent database either entirely or not at all.

SQL standard (and Oracle):

Transaction begins automatically when first SQL command is issued.
Transaction ends (and new one begins) when "COMMIT" command is issued or session ends.
Alternative "AUTO COMMIT" mode turns each statement into a transaction.

(picture)

Transaction properties

Transactions obey the ACID properties: Atomicity, Consistency, Isolation, Durability

(1) Isolation

(picture of multiple clients issuing streams of transactions)

Isolation obtained through serializability: operations within transactions may be interleaved but execution must be equivalent to some sequential (serial) order.

Question: How is this guarantee achieved?

Solves Examples 1,2,3 above

Example 4 (variant on 3):

  Client 1 - INSERT INTO Archive (SELECT * FROM Apply WHERE decision = 'N')
             DELETE FROM Apply WHERE decision = 'N'

  Client 2 - UPDATE Apply SET decision = 'U' WHERE Campus = 'Irvine'

Serialization order can make a big difference. This is the application's problem to solve, not the DBMS.

(2) Durability

If system crashes after transaction commits, all effects of transaction remain in database.

Question: Seems obvious, but all DBMS's manipulate the data in memory, so how is this guarantee achieved?

(3) Atomicity

Each transaction's operations are executed all-or-nothing, never left "half done."

E.g., If system crashes before transaction commits, no effects of transaction remain in database - transaction can start over when system comes back up.
E.g., If error or exception occurs during a transaction, partial effects of the transaction are undone.

Question: How is this guarantee achieved?

"Transaction rollback" = "transaction abort"

Undoes partial effects of a transaction
May be system-initiated or client-initiated

Robust application wraps every transaction with exception for system-initiated rollback.

  Client-initiated rollback:

    BEGIN TRANSACTION;
    <get input from user>
    SQL commands based on input
    <confirm results with user>
    IF input = confirm-OK THEN COMMIT; ELSE ROLLBACK;

Note: Rollback only undoes database changes, not other changes (e.g., program variables) or side-effects (e.g., printing to screen, delivering cash).

Question: No self-respecting database programmer would write the above transaction. Why?

(4) Consistency

Not really a property, more a good application of the other properties.

Idea: Assume all constraints are true at the start of every transaction. Clients are to guarantee, under this assumption and isolation, that all constraints are still true at the end of every transaction. (Similar to program invariants)

Read-only transactions

Can tell system a transaction will not perform writes, system will optimize accordingly.

  "SET TRANSACTION READ ONLY"

Many, many transactions and applications fall into this category.

Question: If there are five read-only transactions and no other transactions, what does the system need to do to guarantee serializability?

Weaker properties

There's a lot of overhead and concurrency reduction to guaranteeing the ACID properties. Sometimes full isolation (i.e., full serializability) is not required.

Three weaker isolation levels:

repeatable read
read committed
read uncommitted

Note: An isolation level is in the eye of the beholder. Specifically, the reads performed by a transaction must adhere to its own isolation level.

Dirty reads

A data item is "dirty" if it has been written by an uncommitted transaction.

Example 5:

  Client 1 - BEGIN TRANSACTION;
             ...
             UPDATE Student SET GPA = .99 * GPA
             ...
             COMMIT;

  Client 2 - BEGIN TRANSACTION;
             ...
             SELECT AVG(GPA) FROM Student
             ...
             COMMIT;

Client 2 may only care about approximate average - dirty reads okay. Use:

  "SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED"

Note: Isolation level of Client 1 transaction is irrelevant unless another transaction updates GPA.

Committed reads

  "SET TRANSACTION ISOLATION LEVEL READ COMMITTED"

Cannot read uncommitted writes
But still doesn't guarantee global serializability even if all other transactions have isolation level SERIALIZABLE.

Modified Example 5:

  Client 1 - BEGIN TRANSACTION; // serializable
             ...
             UPDATE Student SET GPA = .99 * GPA
             ...
             COMMIT;


  Client 2 - SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
             BEGIN TRANSACTION;
             ...
             SELECT AVG(GPA) FROM Student  // executes before Client 1
             ...
             SELECT MAX(GPA) FROM Student  // executes after Client 1
             ...
             COMMIT;

Repeatable read

  "SET TRANSACTION ISOLATION LEVEL REPEATABLE READ"

Modified Example 5: get same average both times

=> But still doesn't guarantee global serializability!

Example 6:

  Client 1 - BEGIN TRANSACTION; // serializable
             ...
             UPDATE Student SET GPA = .99 * GPA
             UPDATE Student SET SAT = 1.01 * SAT
             ...
             COMMIT;

  Client 2 - SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
             BEGIN TRANSACTION;
             ...
             SELECT AVG(GPA) FROM Student  // executes before Client 1
             ...
             SELECT AVG(SAT) FROM Student  // executes after Client 1
             ...
             COMMIT;

The following example is more realistic for repeatable read but not globally serializable, and is based on the fact that repeatable read does not apply to inserted tuples.

Example 7:

  Client 1 - BEGIN TRANSACTION; // serializable
             ...
             INSERT INTO Student <100 new students>
             ...
             COMMIT;

  Client 2 - SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
             BEGIN TRANSACTION;
            ...
             SELECT AVG(GPA) FROM Student  // executes before Client 1
             ...
             SELECT MAX(GPA) FROM Student  // executes after Client 1
             ...
             COMMIT;

New inserted tuples are called phantoms.

Question: What isolation level do you think Oracle supports as a default?

Summary

Standard default: transactions are serializable

Weaker isolation levels increase performance by eliminating overhead and increasing concurrency. From weakest to strongest and the read behaviors they permit:

isolation level dirty reads nonrepeatable reads phantoms

read uncommitted Y Y Y
read committed N Y Y
repeatable read N N Y
serializable N N N

isolation level	dirty reads	nonrepeatable reads	phantoms
read uncommitted	Y	Y	Y
read committed	N	Y	Y
repeatable read	N	N	Y
serializable	N	N	N

Remember that the isolation level is in the eye of the beholding transaction: For true global serializability, every transaction must have isolation level SERIALIZABLE.

=> Much, much more on transactions in CS245 and CS347