Relations: Easy to understand, simple query language
XML: Can be harder to understand, more complex query language (if any)
Example Applications
- University records: students, courses, grades, etc.
Relations or XML? Why?
- University Web site: news, academics, admissions, events, research, etc.
Relations or XML? Why?
- Family tree.
Relations or XML? Why?
Application domain we use for example schemas and instances:
- Textbook: Movies
- Lectures: Booksellers, UC admissions
- Project: Online auction
The Relational Model
- Used by most commercial Database Managements Systems (DBMSs)
- Very simple model
- Enables simple, clean, declarative query languages
Database = set of relations (or tables), each with a distinct name
(Example: Student, Campus)
Each relation has a set of attributes (or columns), with
a distinct name within its relation.
(Example: label Student, Campus)
Each tuple (or row) in a relation has a value for each attribute.
(Example: some tuples for Student, Campus relations)
Each attribute has a type (or domain).
ID:char(9), name:char(25), GPA:float, age:integer
We'll use atomic (indivisible) types only for now.
schema = complete description of structure of relations in
database: relation names, attribute names, types, etc.
instance = actual contents (tuples) of relations
Specifying the schema (on paper):
- Without types:
Student(ID, name, GPA, age)
- With types:
Student(ID: char(9), name: char(25), GPA: float, age: integer)
Naming conflicts:
Student(ID, name, GPA, age)
Campus(name, enrollment) <- duplicate use of "name" okay
Apply(ID, Campus) <- "ID" okay, "Campus" not okay
Null values:
- Can use special null value in attribute of any type
- Can be problematic (e.g., for comparisons)
(Unknown age example)
Keys: A key for a relation is a set of attributes such that no two tuples
can have the same values for all of their key attributes.
- Key values identify specific tuples.
- System may build special indexes over key values.
- Other tuples may use key values as logical "pointers".
- Specify keys by underlining (so no good way to specify multiple keys on paper)
(Example: keys for Student, Campus)
Relations in SQL
- Declared as above with "create table", see book and Oracle
help documents for details
- SQL includes ways to add/drop attributes and specify automatic default
values for attributes.
XML
If you're interested in reading about the XML standard in detail,
visit:
http://www.w3.org/XML/
It will keep you busy for days. A very short document linked from
that site and the course schedule ("XML in 10 points") is required
reading.
XML = Extensible Markup Language
- A (relatively) new standard for data representation and exchange,
primarily intended for the internet
- A document format: a superset of HTML, a subset of SGML (roughly)
- Some say that XML is to data what Java is to programming.
- Take note:
- XML can be very clunky.
- The official specification is enormous, but the basic idea is very simple. In this class we care about the basic idea.
Like SGML and HTML, basic XML consists of three things:
- Tagged elements, which may be nested within one another
- Attributes on elements
- Text
In HTML, tags denote formatting: <Title>, <I>, <Table>, etc.
In XML, tags denote meaning of data: <Student>, <Book_Title>, etc.
One way to format XML data is to use XSL - the Extensible
Stylesheet Language - to translate XML to HTML. We'll probably
cover XSL later in the course.
Well-formed XML
A well-formed XML document is any XML document that follows the basic
rules: single root element, matched tags, unique attribute names, etc.
Example: bookstore data
<?xml version="1.0" standalone="yes"?>
<Bookstore>
<Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd">
<Title>A First Course in Database Systems</Title>
<Authors>
<Author>
<First_Name>Jeffrey</First_Name>
<Last_Name>Ullman</Last_Name>
</Author>
<Author>
<First_Name>Jennifer</First_Name>
<Last_Name>Widom</Last_Name>
</Author>
</Authors>
</Book>
<Book ISBN="ISBN-0-13-031995-3" Price="$75">
<Title>Database Systems: The Complete Book</Title>
<Authors>
<Author>
<First_Name>Hector</First_Name>
<Last_Name>Garcia-Molina</Last_Name>
</Author>
<Author>
<First_Name>Jeffrey</First_Name>
<Last_Name>Ullman</Last_Name>
</Author>
<Author>
<First_Name>Jennifer</First_Name>
<Last_Name>Widom</Last_Name>
</Author>
</Authors>
<Remark>
Amazon.com says: Buy this book bundled with "A First Course,"
it's a great deal!
</Remark>
</Book>
</Bookstore>
A well-formed XML document can contain regular data (as above) or very
irregular data.
Valid XML
It's possible to define a kind of schema for XML data, called a
Document Type Descriptor (DTD).
A DTD is a grammar that describes the legal attributes of tagged
elements and the legal ordering and nesting of the elements.
Example DTD
<!ELEMENT Bookstore (Book | Magazine)*>
<!ELEMENT Book (Title, Authors, Remark?)>
<!ATTLIST Book ISBN CDATA #REQUIRED
Price CDATA #REQUIRED
Edition CDATA #IMPLIED>
<!ELEMENT Magazine (Title)>
<!ATTLIST Magazine Month CDATA #REQUIRED Year CDATA #REQUIRED>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Authors (Author+)>
<!ELEMENT Remark (#PCDATA)>
<!ELEMENT Author (First_Name, Last_Name)>
<!ELEMENT First_Name (#PCDATA)>
<!ELEMENT Last_Name (#PCDATA)>
The DTD is specified at the top of the document or in a separate file
referenced at the top of the document. In both cases use
standalone="no".
Question: What are the benefits of using a DTD?
Question: What are the benefits of not using a DTD?
ID and IDREF(S) Attributes
Element pointers: assign a special ID attribute to an
element, then point to that element with a special IDREF or
IDREFS attribute in another element.
Example: reorganized bookstore
<?xml version="1.0" standalone="no"?>
<!DOCTYPE Bookstore SYSTEM "bookstore.dtd">
<Bookstore>
<Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd" Authors="JU JW">
<Title>A First Course in Database Systems</Title>
</Book>
<Book ISBN="ISBN-0-13-031995-3" Price="$75" Authors="HG JU JW">
<Title>Database Systems: The Complete Book</Title>
<Remark>
Amazon.com says: Buy this book bundled with
<BookRef book="ISBN-0-13-035300-0"></BookRef>,
It's a great deal!
</Remark>
</Book>
<Author Ident="HG">
<First_Name>Hector</First_Name>
<Last_Name>Garcia-Molina</Last_Name>
</Author>
<Author Ident="JU">
<First_Name>Jeffrey</First_Name>
<Last_Name>Ullman</Last_Name>
</Author>
<Author Ident="JW">
<First_Name>Jennifer</First_Name>
<Last_Name>Widom</Last_Name>
</Author>
</Bookstore>
DTD for this data:
<!ELEMENT Bookstore (Book*, Author*)>
<!ELEMENT Book (Title, Remark?)>
<!ATTLIST Book ISBN ID #REQUIRED
Price CDATA #REQUIRED
Edition CDATA #IMPLIED
Authors IDREFS #REQUIRED>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Remark (#PCDATA | BookRef)*>
<!ELEMENT BookRef EMPTY>
<!ATTLIST BookRef book IDREF #REQUIRED>
<!ELEMENT Author (First_Name, Last_Name)>
<!ATTLIST Author Ident ID #REQUIRED>
<!ELEMENT First_Name (#PCDATA)>
<!ELEMENT Last_Name (#PCDATA)>
Digression on (#P)CDATA
The details are messy (recall comments about clunky XML and
official specs above), but overall:
- #PCDATA is only for element content, CDATA is
only for attribute types, you cannot use them interchangeably, and
there are no keywords PCDATA or #CDATA. Both
CDATA and #PCDATA essentially specify text where the
special characters "<", ">", and "&" have to be escaped as
"<", ">" and "&" respectively.
- Recommendation: Use CDATA for string-valued
attributes, use #PCDATA for elements containing text. If you
want an element to contain a mixture of text and other elements, do so
by specifying the element types along with #PCDATA in a
0-or-more list, e.g., (#PCDATA | Author | Editor)*.
XML Coverage
- Not covering: Namespaces, XLink, XPointer
- Covering later: XPath, XQuery
- May or may not cover: XSLT, XML Schema
Relational Model versus XML
(First three are same as earlier)
- Relations: Schema must be fixed in advance
XML: Does not require predefined, fixed schema
- Relations: Rigid flat table structure
XML: Flexible hierarchical structure; graphs
- Relations: Easy to understand, simple query language
XML: Can be harder to understand, more complex query language (if any)
- Relations: Ordering of data not relevant (tuple ordering or attribute ordering)
XML: Ordering forced by document format, may or may not be relevant
- Relations: Transmission and sharing can be problematic
XML: Designed for easy representation and exchange
- Relations: "Native" data model for all current serious commercial DBMSs
XML: "Add-on," often implemented on top of relations