CS145 Lecture Notes (2) -- Relational Algebra



Steps in creating and using a (relational) database:
  1. Design schema

  2. Create schema (using DDL)

  3. "Bulk load" initial data
    => Now have populated database

  4. Repeat: execute queries and updates on the database

Database Query Languages

Given a database, ask questions, get data as answers

Relational Query Languages


Relational Algebra

Example schema:
   Student(ID, name, address, GPA, sizeHS)     // ID is key
   Campus(location, enrollment, rank)          // location is key
   Apply(ID, location, date, major, decision)  // (ID,location) is key

Simplest query: relation name

Query "Campus" returns all tuples in Campus relation.

Relational algebra operators used to combine, filter, etc. the relations:

σ operator

(Example: Students with GPA > 3.7)



(Example: Students with GPA > 3.7 and sizeHS < 1000)



(Example: Applications to S.C. geology major)



** General case: σC(R)

where C can include attribute names, constants, comparisons (=, <, etc.), and connectives (AND, OR, NOT)

π operator

(Example: ID and date of all applications)



π eliminates columns while σ eliminates rows.

To do both, combine ("compose") operators:

(Example: name and address of Students with GPA > 3.7 and sizeHS < 1000)




σ produces a relation, π operates on that relation.

** General case: πA1, A2, ..., Am(E)

where E is any expression producing a relation

Question: Is it ever useful to compose two projection operators?

Example: πenrollmentlocation, enrollment(Campus))




Question: Is it ever useful to compose two selection operators?

Example: σsizeHS < 1000GPA > 3.7(Student))




Duplicates

Example: πdate,decision(Apply) can produce many tuples with same value

Cartesian Product (cross-product)

Campus × Apply

Schema of result is schema(Campus)schema(Apply)







  • Naming conflicts: prepend relation name
  • One tuple in result for each pair of tuples in Campus, Apply

    Formally: R1 × R2 = {t | t = <t1,t2> and t1 ∈ R1 and t2 ∈ R2}

    Question: Looks odd to glue unrelated tuples together. Why use it?

    
    
    
    
    
    
    (Example: Names and addresses of all students with GPA > 3.7 who applied to CS major and were rejected)
    
    
    
    
    
    
    Question: Can we write it in a different way?
    
    
    
    
    
    

    Natural Join

  • Enforces equality on all attributes with same name
  • Eliminates one copy of duplicate attributes

    (Show schema of Campus JOIN Apply)

    
    
    
    
    (Example: Names and addresses of all students with GPA > 3.7 who applied to CS major and were rejected)
    
    
    
    
    
    
    (Example: Names and addresses of all students with GPA > 3.7 who applied to CS major at campus with enrollment < 15,000 and were rejected)
    
    
    
    
    
    
    ** General case:
    E1 JOIN E2 = πschema(E1) ∪ schema(E2)E1.A = E2.A ∧ E1.B = E2.B ∧ ...(E1 × E2))
    
    
    
    
    
    Need to be careful -- suppose we have:
       Student(ID, name, address, GPA, sizeHS)
       Campus(name, enrollment, rank)
    
    "Student JOIN Campus" doesn't make sense.

    Theta Join

    E1 JOINC E2 = σC(E1 X E2)
    
    
    
    
    
  • DBMS often implements theta-join as basic operation.
  • Use of term "join" in implementation circles usually refers to theta-join or sometimes to cross-product.

    Union ∪

    Example schema:
       Student(ID, name, address, GPA, sizeHS)
       OutofStateStudent(ID, name, address, GPA, sizeHS, state)
       ForeignStudent(ID, name, address, GPA, country)
    

    (Example: List name and address of all students)

    Question: Can we do it with cross-products or joins?

    
    
    
    
    
    For union operator: (Example: List name and address of all students)
    
    
    
    
    
    

    Difference -

    Back to original schema:
       Student(ID, name, address, GPA, sizeHS)     // ID is key
       Campus(location, enrollment, rank)          // location is key
       Apply(ID, location, date, major, decision)  // (ID,location) is key
    
    (Example: Find ID's of all students who didn't apply anywhere)
    
    
    
    
    Question: What if want name of students?
    
    
    
    
    For difference operator:

    Intersection ∩

    As expected. Note E1 ∩ E2 = E1 - (E1 - E2)

    Renaming ρ

    Suppose we have:
       Student(ID, name, address)
       Parent(child-ID, parent-name)
    
    (Example: Print list of all student and parent names)
    
    
    
    
    (Example: Find names of all pairs of students who live at same address)
    (Example: no dups)
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ** General case: ρR(A1, A2, ..., Am)(E) or ρR(E)

    Assigning to Temporaries

    (Example: previous query using sequence of assignments)
    
    
    
    
    
    
    
    
    
    
    
    
    
    


    Summary of Relational Algebra

    Basic:
       E ::= R
          |  σC(E)
          |  &piA1, A2, ..., An(E)
          |  E1 × E2
          |  E1 ∪ E2
          |  E1 - E2
          |  ρR(A1, A2, ..., Am)(E)
    
    
    Abbreviations:
          |  E1 JOIN E2
          |  E1 JOINC E2
          |  E1 ∩ E2
          |  TempName(A1, A2, ..., Ak) := E