CS145 Lecture Notes (8) -- Constraints and Triggers

We will cover constraints and triggers for relational databases only.
There is no accepted theory, standard, or implementations of constraints or triggers for XML data.
This lecture uses the SQL standard. It is not about Oracle. (See end of notes.)

Integrity Constraints

Integrity constraints impose restrictions on the allowable data in the database, in addition to the simple structure and type restrictions imposed by the basic schema definition.

Question: We've already seen a few kinds of integrity constraints. What are they?

Why use integrity constraints?

To catch data-entry errors
As correctness criteria when writing database updates
To enforce consistency across data in the database
To tell the system about the data - it may choose to store the data or process queries accordingly

-> Many useful constraints cannot be expressed with the kinds of constraints we've discussed so far.

Example schema:

   Student(ID, name, address, GPA, sizeHS)
   Campus(location, enrollment, rank)
   Apply(ID, location, date, major, decision)

Example constraints:

A student with GPA < 3.0 can only apply to campuses with rank > 4
All applications with date < 1/1/05 have non-NULL decision
Apply.ID and Apply.location appear in Student.ID and Campus.location, respectively
Campus.rank <= 10

Types of Integrity Constraints for Relational Databases

Non-null
Key
Referential integrity
Attribute-based
Tuple-based
General assertions

Declaring and Enforcing Constraints

Two times at which constraints may be declared:

Declared with original schema. Constraints must hold after bulk loading.
Declared later. Constraints must hold on current database.

After declaration, if a SQL statement causes a constraint to become violated then (in most cases) any effects of the statement are undone and a run-time error is generated.

Non-Null Constraints

Restricts attributes to not allow NULL values
Example:

  CREATE TABLE Student (ID integer NOT NULL,
                        name char(30) NOT NULL,
                        address char(100),
                        GPA float NOT NULL,
                        sizeHS integer)

Key Constraints

Example:
ID is key for Student =>
No two tuples in Student can have the same value for their ID attribute

There are two kinds of keys in SQL:

PRIMARY KEY: at most one per table, automatically non-null, automatically indexed (in Oracle)
UNIQUE: any number per table, automatically indexed (in Oracle)

There are two ways to define keys in SQL:

With key attribute
Separate within table definition

Example:

   CREATE TABLE Student (ID integer PRIMARY KEY,
                         name char(30),
                         address char(100),
                         GPA float,
                         sizeHS integer,
                         UNIQUE (name,address))

Referential Integrity

Very important and common kind of constraint
Examples:

If an ID appears in Apply then it also appears in Student.
If a location appears in Apply then it also appears in Campus.

Question: Does the reverse direction in these examples make sense?

Terminology:

Apply.ID "references" Student.ID
"Referential integrity" means referenced value always exists
No "dangling tuples" in the referencing relation when joined with the referenced one

Referential Integrity in SQL

Referenced attribute must be PRIMARY KEY or UNIQUE (e.g., Student.ID, Student.(name,address), Campus.location)
Referencing attribute called FOREIGN KEY (e.g., Apply.ID, Apply.location)

Two ways to define referential integrity in SQL:

With referencing attribute
Separate within referencing table

Example:

  CREATE TABLE Apply(ID integer REFERENCES Student(ID),
                     location char(25),
                     date char(10),
                     major char(10),
                     decision char,
                     FOREIGN KEY (location) REFERENCES Campus(location))

Can omit referenced attribute name if it's the same as referencing attribute:

  ID integer REFERENCES Student, ...

Can have multi-attribute referential integrity.

Can have referential integrity within a single table.
Example:

   Dorm(first-name,
        last-name,
        room-number,
        phone-number,
        roommate-first-name,
        roommate-last-name,
        PRIMARY KEY (first-name,last-name),
        FOREIGN KEY (roommate-first-name,roommate-last-name)
          REFERENCES Dorm(first-name,last-name))

Referential Integrity Enforcement

Consider R(P, ...) and S(..., F references R(P), ...)

Question: What modification operations can violate constraint?

Inserts/updates to S that violate constraint are disallowed.

Must insert referenced value first, then referencing value
Except for "deferred" constraint checking (won't cover)

Deletes from R with referencing values:

(figure)

Four options:

Disallow (default)
SET NULL - set all referencing values to NULL
SET DEFAULT - set all referencing values to default for that column (error if default value not in R.P)
CASCADE - delete all tuples with referencing values

Updates to R.P: same four options (CASCADE mirrors update)

See textbook for syntax.

Question: Why is referenced attribute required to be PRIMARY KEY or UNIQUE?

Attribute-Based Constraints

Constraints on values for Relation.Attribute

Type specifications are a kind of attribute-based constraint.
Non-null constraints are a kind of attribute-based constraint.

SQL Syntax: Add "CHECK(<condition>)" to attribute.
<condition> is like a SQL Where clause, can refer to attribute itself.
Examples:

   CREATE TABLE Student (...
                         GPA float CHECK(GPA <= 4.0 AND GPA > 0),
                         ...)

   CREATE TABLE Apply (...
                       decision char CHECK(decision in ('Y','N','U'))
                       ...)

Conditions can be more complex, e.g., with subqueries.
Example:

   Apply (...
          major char(10) CHECK(major NOT IN
                                  (SELECT dept-name FROM Dept
                                   WHERE status = 'full'))
          ...)

Question: Can we encode referential integrity constraints this way?

-> Attribute-based constraints are only checked on inserts and updates to that relation, NOT on modifications to relations referenced in subqueries -- significant "hole" in constraint-checking.

Tuple-Based Constraints

Constraints on values for Relation.Tuple

Similar to attribute-based constraints except CHECK applies to entire tuples.
=> Specified separately in table definition
Example:

  CREATE TABLE Campus(location char(25),
                      enrollment integer,
                      rank integer,
                      CHECK(enrollment >= 10,000 OR rank > 5)

(Example encodes enrollment < 10,000 => rank > 5)

As with attribute-based constraints, can have subqueries but constraint is only checked when the relation itself is modified

General Assertions

Constraints on entire relation or entire database

In SQL, stand-alone statement:

  CREATE ASSERTION <name> CHECK(<condition>)

Example: Average GPA is > 3.0 and average sizeHS is < 1000

  CREATE ASSERTION Avgs CHECK(
    3.0 < (SELECT avg(GPA) FROM Student) AND
    1000 > (SELECT avg(sizeHS) FROM Student))

Example: A student with GPA < 3.0 can only apply to campuses with rank > 4.

  CREATE ASSERTION RestrictApps CHECK(
     NOT EXISTS (SELECT * FROM Student, Apply, Campus
                 WHERE Student.ID = Apply.ID
                 AND Apply.location = Campus.location
                 AND Student.GPA < 3.0 AND Campus.rank <= 4))

Assertions checked for each modification that could potentially violate them.

Question: What operations could violate assertion Avgs?

Question: What operations could violate assertion RestrictApps?

Q: Can we encode referential integrity constraints this way?

Triggers

More general than constraints
In SQL-99 but not in SQL2, in most products in some form

General form:

  CREATE TRIGGER <name>
  BEFORE | AFTER | INSTEAD OF <events>
  <referencing clause>                // optional
  FOR EACH ROW                        // optional 
  WHEN (<condition>)                  // optional
  <action>

<events> can be:

  INSERT ON R
  DELETE ON R
  UPDATE [OF A1, A2, ..., An] ON R

AFTER <events> are the most common and useful. Don't worry too much about BEFORE and INSTEAD OF, they can be messy.

<condition>: like general assertion
<action>: sequence of SQL statements
FOR EACH ROW
- If present, execute trigger once for each tuple changed.
- If absent, execute trigger once for each relevant statement.
- Terminology: "row-level" versus "statement-level"
- In either case, trigger executes after statement completes.
<referencing clause>:
REFERENCING <thing1> AS <var1> <thing2> AS <var2>, etc.
<thing> can be:
- OLD TABLE - previous values of deleted or updated tuples, row-level or statement-level, DELETE or UPDATE
- NEW TABLE - current values of inserted or updated tuples, row-level or statement-level, INSERT or UPDATE
- OLD ROW - previous value of deleted or updated tuple, row-level only, DELETE or UPDATE
- NEW ROW - current value of inserted or updated tuple, row-level only, INSERT or UPDATE

Example: If an application tuple is inserted for a student with GPA > 3.9 and sizeHS > 1500 to Berkeley, set decision to "Y".

  CREATE TRIGGER AutoAccept
  AFTER INSERT ON Apply
  REFERENCING NEW ROW AS NewApp
  FOR EACH ROW
  WHEN (NewApp.location = 'Berkeley' AND
        3.9 < (SELECT GPA FROM Student WHERE ID = NewApp.ID) AND
        1500 < (SELECT sizeHS FROM Student WHERE ID = NewApp.ID))
  UPDATE Apply
    SET decision = 'Y'
    WHERE ID = NewApp.ID
    AND location = NewApp.location
    AND date = NewApp.date

Question: Can we achieve the same effect using assertions?

Example: Same trigger without FOR EACH ROW

  CREATE TRIGGER AutoAccept
  AFTER INSERT ON Apply
  REFERENCING NEW TABLE AS NewApps
  UPDATE Apply
    SET decision = 'Y'
    WHERE (ID,location,date) IN (SELECT ID,location,date FROM NewApps)
    AND location = 'Berkeley'
    AND 3.9 < (SELECT GPA FROM Student WHERE ID = Apply.ID)
    AND 1500 > (SELECT sizeHS FROM Student WHERE ID = Apply.ID)

Example: If campus enrollment increases from below 7,000 to above 7,000, delete all applications to that campus dated after 2/15/05 and set all "Y" decisions for applications before 2/15/05 to "U".

  CREATE TRIGGER TooMany
  AFTER UPDATE OF enrollment ON Campus
  REFERENCING OLD ROW AS OldVal NEW ROW AS NewVal
  FOR EACH ROW
  WHEN (OldVal.enrollment <= 7,000 AND NewVal.enrollment > 7,000)
  DELETE FROM Apply
    WHERE location = NewVal.location AND date > 2/15/05;
  UPDATE Apply
    SET decision = 'U'
    WHERE location = NewVal.location
    AND decision = 'Y'

Question: Can we achieve the same effect using assertions?

Triggers and assertions:

Triggers are more flexible and expressive than assertions.
In practice the most common use of triggers is to effectively implement assertions.
Can check assertions and raise errors; can also automatically repair violations.

What Oracle Supports

Non-null constraints and key constraints as covered
Referential integrity with only default behavior plus "ON DELETE SET NULL" and "ON DELETE CASCADE"
Attribute-based and tuple-based CHECK constraints without subqueries
No general assertions
Trigger specifications:
- INSTEAD OF for triggers on views only (don't try this at home)
- No OLD TABLE or NEW TABLE in REFERENCING clause
- Referencing clause uses OLD instead of OLD ROW and NEW instead of NEW ROW
- WHEN clause allowed only with FOR EACH ROW
- No subqueries in WHEN condition
- <action> is PL/SQL procedure with some restrictions
Trigger interactions: significant and fairly complex restrictions
Question: What is being prevented by restricting triggering interactions?