(Diagram of how the SQL standard evolves over time)
We'll cover primarily SQL92 (SQL2), some features of SQL:1999 (SQL3).
SQL is a big language, not just queries and updates.
Components of the language:
In lectures:
CREATE TABLE name (att1 type1, att2 type2, ..., attN typeN) DROP TABLE nameSee textbook for details of types.
SELECT, INSERT, DELETE, UPDATESchema for examples:
Student(ID, name, address, GPA, sizeHS) // ID is key Campus(location, enrollment, rank) // location is key Apply(ID, location, date, major, decision) // (ID,location) is key
SELECT A1, A2, ..., An FROM R1, R2, ..., Rm WHERE conditionEquivalent to relational algebra query:
πA1, A2, ..., An (σcondition(R1 × R2 × ... × Rm)) Returns unnamed relation whose schema is (A1, A2, ..., An)
(Example: names and addresses of all students with GPA > 3.7)
(Example: names and application dates of all students applying to CS at Berkeley with sizeHS < 1000)
(Example: locations plus ranks of all campuses with applicants to "bioinformatics" major)
DISTINCT
(modify last query to use DISTINCT
)
** This is a significant difference between SQL and relational algebra.
(Example: ID, name, GPA, and application date of students applying to S.C. sorted by application date then descending GPA)
WHERE
clause
(Example: ID's for all applicants with a major that includes "bio")
(change previous two queries to use "SELECT *
")
(Example: Student relation except scale GPA by sizeHS)
Also note the WHERE
clause is optional.
(Example: rename attributes in previous query)
AS
keyword is optional.
(Example: student ID pairs who live at same address)
AS
keyword is optional.
(Example: all campus locations mentioned in the database)
UNION ALL
to retain duplicates
INTERSECT [ALL]
and EXCEPT [ALL]
(MINUS [ALL]
in
Oracle) are similar
(Example: ID's of students with GPA > 3.7 that did not apply to a campus with rank <= 3)
Question: How does what we've covered so far compare in expressiveness to relational algebra?
SELECT
statements appearing in WHERE
clause
IN
,
EXISTS
, ALL
, ANY
(Example: ID's and names of students applying to "CS" major)
Question: Can we write it without a subquery?
Question: Is the result the same?
Question: Why do we care so much about duplicates?
(Example: ID's and names of students who did not apply to Berkeley)
Note: Can use "
A NOT IN ...
" and "NOT (A IN ...)
" interchangeably
Question: Can we write it without a subquery?
(Example: students such that some other student has same name and GPA,
using EXISTS
)
(Example: rewrite using multi-attribute IN
)
(Example: student(s) from the biggest high school, using "NOT EXISTS
")
(Example: rewrite using ">= ALL"
)
Uses of
ALL
:
SELECT ... WHERE ... att < ALL (subquery) <= ALL > ALL >= ALL = ALL <> ALLCan also precede expression with
NOT
, e.g.:
SELECT ... WHERE ... NOT (att < ALL ...)Same table for
ANY
:
SELECT ... WHERE ... att < ANY (subquery) <= ANY > ANY >= ANY = ANY <> ANYCan also precede expression with
NOT
, e.g.:
SELECT ... WHERE ... NOT (att < ANY ...)
(Example: Find all applications that are not earliest, using "> ANY
")
Questions: Are there any redundancies in the above tables including
NOT
?
FROM
(Example: get ID and scaled GPA when the scaled GPA lies outside of some range)
(Example: average GPA of all Students)
(Example: average GPA of students applying to "CS")
Question: Is every student's GPA counted exactly one time?
(Example: amount by which average "CS" applicant GPA exceeds overall average)
(Example: number of applicants to Berkeley)
Question: What if students can apply to campuses multiple times?
SQL aggregate functions are:
MIN, MAX, SUM, AVG, COUNT
GROUP BY
(Example: number of applicants to each campus)
(show picture)
=> If there's a
GROUP BY
clause then only grouped
attributes and aggregates may appear in the SELECT
clause.
(Example: maximum applicant GPA for each major at each location)
(Example: for each student -- ID, name, and number of campuses applied to)
Question: What if also want to include students who have applied to 0 campuses?
HAVING
Clause (Example: all majors at Berkeley with >500 applicants)
=>
WHERE
clause performs filtering before the grouping, HAVING
performs
filtering after the grouping.
Question: Can we write the query above with a subquery instead of
HAVING
?
Question: Are there HAVING
queries that can't be rewritten?
SELECT attributes, aggregates FROM relations WHERE condition GROUP BY grouping attributes HAVING conditions on aggregates ORDER BY attributes, aggregates=> All optional except
SELECT
and FROM
("Flowchart" of clause evaluation)
NULL
.
Examples:
(Example: all students with "sizeHS > 1000", then "NOT sizeHS > 1000")
(Example: add "AND GPA > 3.7", change to "OR GPA > 3.7")
(Example: all students whose sizeHS is NULL
but GPA is not NULL
)
INSERT INTO R VALUES (A1, A2, ..., AN)Example:
INSERT INTO Campus VALUES ('Irvine', 30,000, 8)
Can omit attribute values, assigned to default or NULL
Second form:
INSERT INTO R subquery
(Example: populat relation Visit with students with GPA > 3.9 and sizeHS < 1000)
DELETE FROM R WHERE conditionJust like
SELECT
except over one relation
(Example: delete all students who did not apply anywhere)
UPDATE R SET A1 = E1, A2 = E2, ..., An = En WHERE condition
(Example: accept everyone applying to Berkeley with GPA > 3.9)
( ... and make them into CS majors)