CS145 Lecture Notes (7) -- Higher-Level Design: UML
- Data modeling: Figuring out how to represent, in a
database, the data for a given application
- So far we have studied two general data models:
For the relational model we studied "design principles" for
coming up with a good schema. Well-accepted design principles for XML
don't exist yet.
- Another approach to coming up with a good schema is to use a
database design model -- a high-level model that is convenient
and intuitive for schema design, but not necessarily implemented by
the target DBMS. The design is then translated (automatically or
semi-automatically) into the model of the DBMS.
- The most common historical database design model is the
entity-relationship model.
- Currently the most common database design model is a subset of
UML (the Unified Modeling Language). UML is a large, complex,
standardized specification language designed primarily for modeling
software systems. It has a data modeling subset similar to the
original entity-relationship design model.
Main concepts for UML data modeling:
- Classes
- Associations
- Association Classes
- Subclasses
- Aggregation and Composition
We will cover each of these, then discuss translating UML designs to
relational schemas.
Classes
- Class name, list of attributes, list of methods
- For data modeling, we drop the methods, add keys, optionally add types for attributes
(Example: Student
and Campus
classes)
Associations
- Relationships between objects of two classes
- Denoted by line, labeled with term (and optional arrow) describing relationship
(Example: Applied
association between Student
and Campus
)
Multiplicity of Associations
Notation "m..n
" on class C1 end of association between classes
C1 and C2 says:
- At least m and at most n objects of class C1 are related to exactly
one object of class C2.
(Example: 10,000..20,000 on Student
side of
Applied
association)
Special end cases:
- "m..*" means no upper limit
- "0..n" means no lower limit (may be "dangling" C2 objects that don't participate in association)
- "0..*" means no restrictions at all
(Example: Every student must apply somewhere, every student can
apply to up to 5 campuses, each campus takes at most 20,000 applications)
Shorthands and defaults:
- "*" is shorthand for "0..*" (no restrictions)
- "1" is shorthand for "1..1" (exactly one)
- Default is "1..1"
Familiar types of relationships and their representation as UML
association multiplicities:
- One-to-one relationships: Each object in class C1 is
related to at most one object in class C2 and vice-versa
- Many-to-one relationships: Each object in class C1 is
related to at most one object in C2, but each object in C2 may be
related to many objects in C1
- Many-to-many relationships: Each object in class C1 may be
related to many objects in C1 and vice-versa
- Complete relationships: All objects participate in at
least one relationship ("referential integrity" in relational terms =
no "dangling" tuples)
Association Classes
- When there are attributes on an association
(Example: class AppInfo
on Applied
association)
Question: Under what circumstances can we combine an association
class with one of the two classes participating in the association?
Self-Associations
Associations between a class and itself
(Example: Sibling
association on Student
)
If objects take on different roles, label with two terms
(Example: Mentor
association on Student
)
Subclasses
(Big example)
The subclasses ("specializations") of a superclass ("generalization") are:
- incomplete (partial) or complete. If
complete, all objects in superclass belong to one or more subclasses.
- disjoint (exclusive) or overlapping.
If disjoint, no object is in more than one subclass
Label arrow with "{incomplete/complete, disjoint/overlapping}".
Examples:
- Just
ForeignStud
and DomStud
are complete and disjoint
- Just
APStud
is incomplete
- All three are complete and overlapping
Aggregation
- Objects in one class are (partially) composed of a set of objects in
another class
(Example: Dorm
contains set of Student
's)
- any multiplicity on non-diamond side.
- Implicit "0..1" multiplicity on diamond side
Composition
- Same as aggregation except filled-in diamond indicates implicit "1..1" on diamond side
multiplicity
Translating a UML Design to a Relational Schema
=> If every "regular" class has a key then the translation can be
fully automated. (Association classes and subclasses are excluded
from requiring keys.)
Classes
Every class becomes a relation -- translation is direct, including keys.
Associations
(solicit from class)
(Example: Applied
association)
Question: What is the key for the association relation?
Question: Do we always need a separate relation for an association?
Association Classes
(solicit from class)
(Example: AppInfo
association class)
Self-Associations
(Mentor/Mentee
example)
Subclasses
Create relation for superclass. Three possible translations for subclasses.
- Subclass relations contain just specialized attributes and key
for superclass
- Subclass relations contains all attributes
- No subclass relations, superclass relation contains all
generalized and specialized attributes (lots of nulls)
Best translation may depend on properties, for example:
- Heavily overlapping argues for translation 3
- Complete and disjoint argues for translation 2 (and no superclass relation)
(Big example, translation 1)
Aggregation and Composition
(solicit from class, use Dorm
example)