CS145 Lecture Notes (7) -- Higher-Level Design: UML

Data modeling: Figuring out how to represent, in a database, the data for a given application
So far we have studied two general data models:
- Relational
- XML
For the relational model we studied "design principles" for coming up with a good schema. Well-accepted design principles for XML don't exist yet.
Another approach to coming up with a good schema is to use a database design model -- a high-level model that is convenient and intuitive for schema design, but not necessarily implemented by the target DBMS. The design is then translated (automatically or semi-automatically) into the model of the DBMS.
The most common historical database design model is the entity-relationship model.
Currently the most common database design model is a subset of UML (the Unified Modeling Language). UML is a large, complex, standardized specification language designed primarily for modeling software systems. It has a data modeling subset similar to the original entity-relationship design model.

Main concepts for UML data modeling:

Classes
Associations
Association Classes
Subclasses
Aggregation and Composition

We will cover each of these, then discuss translating UML designs to relational schemas.

Classes

Class name, list of attributes, list of methods
For data modeling, we drop the methods, add keys, optionally add types for attributes

(Example: Student and Campus classes)

Associations

Relationships between objects of two classes
Denoted by line, labeled with term (and optional arrow) describing relationship

(Example: Applied association between Student and Campus)

Multiplicity of Associations

Notation "m..n" on class C1 end of association between classes C1 and C2 says:

At least m and at most n objects of class C1 are related to exactly one object of class C2.

(Example: 10,000..20,000 on Student side of Applied association)

Special end cases:

"m..*" means no upper limit
"0..n" means no lower limit (may be "dangling" C2 objects that don't participate in association)
"0..*" means no restrictions at all

(Example: Every student must apply somewhere, every student can apply to up to 5 campuses, each campus takes at most 20,000 applications)

Shorthands and defaults:

"*" is shorthand for "0..*" (no restrictions)
"1" is shorthand for "1..1" (exactly one)
Default is "1..1"

Familiar types of relationships and their representation as UML association multiplicities:

One-to-one relationships: Each object in class C1 is related to at most one object in class C2 and vice-versa
Many-to-one relationships: Each object in class C1 is related to at most one object in C2, but each object in C2 may be related to many objects in C1
Many-to-many relationships: Each object in class C1 may be related to many objects in C1 and vice-versa
Complete relationships: All objects participate in at least one relationship ("referential integrity" in relational terms = no "dangling" tuples)

Association Classes

When there are attributes on an association

(Example: class AppInfo on Applied association)

Question: Under what circumstances can we combine an association class with one of the two classes participating in the association?

Self-Associations

Associations between a class and itself

(Example: Sibling association on Student)

If objects take on different roles, label with two terms

(Example: Mentor association on Student)

Subclasses

(Big example)

The subclasses ("specializations") of a superclass ("generalization") are:

incomplete (partial) or complete. If complete, all objects in superclass belong to one or more subclasses.
disjoint (exclusive) or overlapping. If disjoint, no object is in more than one subclass

Label arrow with "{incomplete/complete, disjoint/overlapping}".

Examples:

Just ForeignStud and DomStud are complete and disjoint
Just APStud is incomplete
All three are complete and overlapping

Aggregation

Objects in one class are (partially) composed of a set of objects in another class

(Example: Dorm contains set of Student's)

any multiplicity on non-diamond side.
Implicit "0..1" multiplicity on diamond side

Composition

Same as aggregation except filled-in diamond indicates implicit "1..1" on diamond side multiplicity

Translating a UML Design to a Relational Schema

=> If every "regular" class has a key then the translation can be fully automated. (Association classes and subclasses are excluded from requiring keys.)

Classes

Every class becomes a relation -- translation is direct, including keys.

Associations

(solicit from class)

(Example: Applied association)

Question: What is the key for the association relation?

Question: Do we always need a separate relation for an association?

Association Classes

(solicit from class)

(Example: AppInfo association class)

Self-Associations

(Mentor/Mentee example)

Subclasses

Create relation for superclass. Three possible translations for subclasses.

Subclass relations contain just specialized attributes and key for superclass
Subclass relations contains all attributes
No subclass relations, superclass relation contains all generalized and specialized attributes (lots of nulls)

Best translation may depend on properties, for example:

Heavily overlapping argues for translation 3
Complete and disjoint argues for translation 2 (and no superclass relation)

(Big example, translation 1)

Aggregation and Composition

(solicit from class, use Dorm example)