## CS145 Lecture Notes (2) -- Relational Algebra

Steps in creating and using a (relational) database:
1. Design schema

2. Create schema (using DDL)

=> Now have populated database

4. Repeat: execute queries and updates on the database

### Database Query Languages

• Get all students with GPA > 3.7 who applied to Berkeley and San Diego and nowhere else.
• Get all humanities departments at campuses in southern California with < 500 applicants.
• Get the campus with highest average accept rate over the last five years.

• Some questions are easy to pose, some are not.
• Some questions are easy for DBMS to answer, some are not.
• "Query language" also used to update the database.

### Relational Query Languages

• Formal: relational algebra, relational calculus, Datalog

• Actual: SQL, Quel

• In all languages, a query is executed over a set of relations, get a relation as the result.

### Relational Algebra

Example schema:
```   Student(ID, name, address, GPA, sizeHS)     // ID is key
Campus(location, enrollment, rank)          // location is key
Apply(ID, location, date, major, decision)  // (ID,location) is key

```

#### Simplest query: relation name

Query "`Campus`" returns all tuples in Campus relation.

Relational algebra operators used to combine, filter, etc. the relations:

### σ operator

(Example: Students with GPA > 3.7)
```

```
(Example: Students with GPA > 3.7 and sizeHS < 1000)
```

```
(Example: Applications to S.C. geology major)
```

```
** General case: σC(R)

where `C` can include attribute names, constants, comparisons (`=`, `<`, etc.), and connectives (`AND, OR, NOT`)

### π operator

(Example: ID and date of all applications)
```

```
π eliminates columns while σ eliminates rows.

To do both, combine ("compose") operators:

(Example: name and address of Students with GPA > 3.7 and sizeHS < 1000)

```

```
σ produces a relation, π operates on that relation.

** General case: πA1, A2, ..., Am(E)

where E is any expression producing a relation

Question: Is it ever useful to compose two projection operators?

Example: πenrollmentlocation, enrollment(Campus))

```

```

Question: Is it ever useful to compose two selection operators?

Example: σsizeHS < 1000GPA > 3.7(Student))

```

```

#### Duplicates

Example: πdate,decision(Apply) can produce many tuples with same value
• Relational algebra semantics says remove duplicates.
• SQL does not - one difference between formal and actual query languages

#### Cartesian Product (cross-product)

Campus × Apply

Schema of result is `schema(Campus)``schema(Apply)`

```

```
• Naming conflicts: prepend relation name
• One tuple in result for each pair of tuples in Campus, Apply

Formally: R1 × R2 = {t | t = <t1,t2> and t1 ∈ R1 and t2 ∈ R2}

Question: Looks odd to glue unrelated tuples together. Why use it?

```

```
(Example: Names and addresses of all students with GPA > 3.7 who applied to CS major and were rejected)
```

```
Question: Can we write it in a different way?
```

```

#### Natural Join

• Enforces equality on all attributes with same name
• Eliminates one copy of duplicate attributes

(Show schema of Campus JOIN Apply)

```

```
(Example: Names and addresses of all students with GPA > 3.7 who applied to CS major and were rejected)
```

```
(Example: Names and addresses of all students with GPA > 3.7 who applied to CS major at campus with enrollment < 15,000 and were rejected)
```

```
** General case:
E1 JOIN E2 = πschema(E1) ∪ schema(E2)E1.A = E2.A ∧ E1.B = E2.B ∧ ...(E1 × E2))
```

```
Need to be careful -- suppose we have:
```   Student(ID, name, address, GPA, sizeHS)
Campus(name, enrollment, rank)
```
"`Student JOIN Campus`" doesn't make sense.

#### Theta Join

E1 JOINC E2 = σC(E1 X E2)
```

```
• DBMS often implements theta-join as basic operation.
• Use of term "join" in implementation circles usually refers to theta-join or sometimes to cross-product.

#### Union ∪

Example schema:
```   Student(ID, name, address, GPA, sizeHS)
OutofStateStudent(ID, name, address, GPA, sizeHS, state)
```

(Example: List name and address of all students)

Question: Can we do it with cross-products or joins?

```

```
For union operator:
• Schemas must match exactly.
• Duplicates eliminated (as usual).
(Example: List name and address of all students)
```

```

#### Difference -

Back to original schema:
```   Student(ID, name, address, GPA, sizeHS)     // ID is key
Campus(location, enrollment, rank)          // location is key
Apply(ID, location, date, major, decision)  // (ID,location) is key
```
(Example: Find ID's of all students who didn't apply anywhere)
```

```
Question: What if want name of students?
```

```
For difference operator:
• Schemas must match exactly, duplicates eliminated

#### Intersection ∩

As expected. Note E1 ∩ E2 = E1 - (E1 - E2)

#### Renaming ρ

Suppose we have:
```   Student(ID, name, address)
Parent(child-ID, parent-name)
```
(Example: Print list of all student and parent names)
```

```
(Example: Find names of all pairs of students who live at same address)
(Example: no dups)
```

```
** General case: ρR(A1, A2, ..., Am)(E) or ρR(E)

#### Assigning to Temporaries

• We've been writing relational algebra queries as expressions.
• Can equivalently use relational algebra expression trees (see textbook)
• Can equivalently use sequences of algebraic assignments (called linear notation in textbook)
(Example: previous query using sequence of assignments)
```

```

### Summary of Relational Algebra

Basic:
```   E ::= R
|  σC(E)
|  &piA1, A2, ..., An(E)
|  E1 × E2
|  E1 ∪ E2
|  E1 - E2
|  ρR(A1, A2, ..., Am)(E)

```
Abbreviations:
```      |  E1 JOIN E2
|  E1 JOINC E2
|  E1 ∩ E2
|  TempName(A1, A2, ..., Ak) := E
```