CS145 Lecture Notes (3) -- SQL: Queries and Updates

SQL ("S.Q.L." or "sequel") - Structured Query Language
Query language of all commercial relational DBMS's

(Diagram of how the SQL standard evolves over time)

We'll cover primarily SQL92 (SQL2), some features of SQL:1999 (SQL3).

SQL is a big language, not just queries and updates.

Components of the language:

Schema definition
Data retrieval
Data modification
Indexes
Constraints
Views
Triggers
Transactions
Authorization
etc.

DDL = Data Definition Language
DML = Data Manipulation Language

In lectures:

We try to use "pure" SQL, conforming to the standard (more or less). Oracle SQL and MySQL don't always conform. I.e., don't try to run the lecture notes on Oracle or MySQL. See the project support materials on the classs webpage for details of how Oracle SQL differs from generic SQL.
Assume "interactive" SQL -- commands typed to a prompt and answers printed as tables. Next week, we'll talk about how to embed SQL in a computer progam.

Basic DDL commands


   CREATE TABLE name (att1 type1, att2 type2, ..., attN typeN)

   DROP TABLE name

See textbook for details of types.

DML commands

   SELECT, INSERT, DELETE, UPDATE

Schema for examples:

   Student(ID, name, address, GPA, sizeHS)     // ID is key
   Campus(location, enrollment, rank)          // location is key
   Apply(ID, location, date, major, decision)  // (ID,location) is key

SELECT Statement

   SELECT A1, A2, ..., An
   FROM   R1, R2, ..., Rm
   WHERE  condition

Equivalent to relational algebra query:

π_{A₁, A₂, ..., A_n} (σ_condition(R₁ × R₂ × ... × R_m)) Returns unnamed relation whose schema is (A₁, A₂, ..., A_n)

(Example: names and addresses of all students with GPA > 3.7)

(Example: names and application dates of all students applying to CS at Berkeley with sizeHS < 1000)

(Example: locations plus ranks of all campuses with applicants to "bioinformatics" major)

Duplicates

Retained unless use keyword DISTINCT

(modify last query to use DISTINCT)

** This is a significant difference between SQL and relational algebra.

ORDER BY

Sorts result

(Example: ID, name, GPA, and application date of students applying to S.C. sorted by application date then descending GPA)

LIKE

For string matching in the WHERE clause

(Example: ID's for all applicants with a major that includes "bio")

SELECT *

Returns all attributes in cross-product

(change previous two queries to use "SELECT *")

SELECT with Arithmetic

(Example: Student relation except scale GPA by sizeHS)

Also note the WHERE clause is optional.

Renaming Attributes

(Example: rename attributes in previous query)

AS keyword is optional.

Tuple Variables

(Example: student ID pairs who live at same address)

Can use tuple variables anytime for convenience and readability.
Here too the AS keyword is optional.

Union

(Example: all campus locations mentioned in the database)

Union eliminates duplicates, use UNION ALL to retain duplicates
INTERSECT [ALL] and EXCEPT [ALL] (MINUS [ALL] in Oracle) are similar

(Example: ID's of students with GPA > 3.7 that did not apply to a campus with rank <= 3)

Question: How does what we've covered so far compare in expressiveness to relational algebra?

Subqueries

SELECT statements appearing in WHERE clause

Treated as value if returns a single tuple result
Otherwise treated as relation and compared to values using IN, EXISTS, ALL, ANY

(Example: ID's and names of students applying to "CS" major)

Question: Can we write it without a subquery?

Question: Is the result the same?

Question: Why do we care so much about duplicates?

(Example: ID's and names of students who did not apply to Berkeley)

Note: Can use "A NOT IN ..." and "NOT (A IN ...)" interchangeably

Question: Can we write it without a subquery?

(Example: students such that some other student has same name and GPA, using EXISTS)

(Example: rewrite using multi-attribute IN)

(Example: student(s) from the biggest high school, using "NOT EXISTS")

(Example: rewrite using ">= ALL")

Uses of ALL:

   SELECT ... WHERE ... att <  ALL (subquery)
                            <= ALL
                            >  ALL
                            >= ALL
                            =  ALL
                            <> ALL

Can also precede expression with NOT, e.g.:

   SELECT ... WHERE ... NOT (att < ALL ...)

Same table for ANY:

   SELECT ... WHERE ... att <  ANY (subquery)
                            <= ANY
                            >  ANY
                            >= ANY
                            =  ANY
                            <> ANY

Can also precede expression with NOT, e.g.:

   SELECT ... WHERE ... NOT (att < ANY ...)

(Example: Find all applications that are not earliest, using "> ANY")

Questions: Are there any redundancies in the above tables including NOT?

Subqueries in `FROM`

(Example: get ID and scaled GPA when the scaled GPA lies outside of some range)

Aggregates

Clearly goes beyond relational algebra in expressiveness

(Example: average GPA of all Students)

(Example: average GPA of students applying to "CS")

Question: Is every student's GPA counted exactly one time?

(Example: amount by which average "CS" applicant GPA exceeds overall average)

(Example: number of applicants to Berkeley)

Question: What if students can apply to campuses multiple times?

SQL aggregate functions are: MIN, MAX, SUM, AVG, COUNT

`GROUP BY`

(Example: number of applicants to each campus)

(show picture)

=> If there's a GROUP BY clause then only grouped attributes and aggregates may appear in the SELECT clause.

(Example: maximum applicant GPA for each major at each location)

(Example: for each student -- ID, name, and number of campuses applied to)

Question: What if also want to include students who have applied to 0 campuses?

`HAVING` Clause

Conditions on aggregates

(Example: all majors at Berkeley with >500 applicants)

=> WHERE clause performs filtering before the grouping, HAVING performs filtering after the grouping.

Question: Can we write the query above with a subquery instead of HAVING?

Question: Are there HAVING queries that can't be rewritten?

SQL Summary

   SELECT    attributes, aggregates
   FROM      relations
   WHERE     condition
   GROUP BY  grouping attributes
   HAVING    conditions on aggregates
   ORDER BY  attributes, aggregates

=> All optional except SELECT and FROM

("Flowchart" of clause evaluation)

Null Values

Every type implicitly includes NULL.

Examples:

No decision information yet
Student with no GPA

(Example: all students with "sizeHS > 1000", then "NOT sizeHS > 1000")

(Example: add "AND GPA > 3.7", change to "OR GPA > 3.7")

(Example: all students whose sizeHS is NULL but GPA is not NULL)

Data Modification

INSERT Statement

   INSERT INTO R VALUES (A1, A2, ..., AN)

Example: INSERT INTO Campus VALUES ('Irvine', 30,000, 8)

Can omit attribute values, assigned to default or NULL

Second form:

   INSERT INTO R subquery

(Example: populat relation Visit with students with GPA > 3.9 and sizeHS < 1000)

DELETE Statement

   DELETE FROM R WHERE condition

Just like SELECT except over one relation

(Example: delete all students who did not apply anywhere)

UPDATE Statement

   UPDATE R
   SET    A1 = E1, A2 = E2, ..., An = En
   WHERE  condition

(Example: accept everyone applying to Berkeley with GPA > 3.9)

( ... and make them into CS majors)