CS145 Lecture Notes -- Data: Relations and XML



Terminology reminder

Relational Model versus XML: Fundamental Differences

  1. Relations: Schema must be fixed in advance
    XML: Does not require predefined, fixed schema

  2. Relations: Rigid flat table structure
    XML: Flexible hierarchical structure (also arbitrary graphs)

  3. Relations: Easy to understand, simple query language
    XML: Can be harder to understand, more complex query language (if any)

    Example Applications

    1. University records: students, courses, grades, etc.
      Relations or XML? Why?
      
      
      
      
      
    2. University Web site: news, academics, admissions, events, research, etc.
      Relations or XML? Why?
      
      
      
      
      
    3. Family tree.
      Relations or XML? Why?
      
      
      
      
      
    Application domain we use for example schemas and instances:

    The Relational Model

    Database = set of relations (or tables), each with a distinct name

    (Example: Student, Campus)

    
    
    
    
    
    
    
    
    
    
    
    
    Each relation has a set of attributes (or columns), with a distinct name within its relation.

    (Example: label Student, Campus)

    
    
    
    
    
    
    
    Each tuple (or row) in a relation has a value for each attribute.

    (Example: some tuples for Student, Campus relations)

    
    
    
    
    
    
    
    
    Each attribute has a type (or domain).
      ID:char(9), name:char(25), GPA:float, age:integer
    
    We'll use atomic (indivisible) types only for now.


    schema = complete description of structure of relations in database: relation names, attribute names, types, etc.
    instance = actual contents (tuples) of relations

    Specifying the schema (on paper):

    Naming conflicts:
        Student(ID, name, GPA, age)
        Campus(name, enrollment)      <- duplicate use of "name" okay
        Apply(ID, Campus)             <- "ID" okay, "Campus" not okay
    
    Null values: (Unknown age example)
    
    
    
    
    
    
    
    
    
    
    Keys: A key for a relation is a set of attributes such that no two tuples can have the same values for all of their key attributes. (Example: keys for Student, Campus)
    
    
    
    
    
    

    Relations in SQL



    XML

    If you're interested in reading about the XML standard in detail, visit:
       http://www.w3.org/XML/
    
    It will keep you busy for days. A very short document linked from that site and the course schedule ("XML in 10 points") is required reading.


    XML = Extensible Markup Language Like SGML and HTML, basic XML consists of three things:
    1. Tagged elements, which may be nested within one another
    2. Attributes on elements
    3. Text

    In HTML, tags denote formatting: <Title>, <I>, <Table>, etc.

    In XML, tags denote meaning of data: <Student>, <Book_Title>, etc.

    One way to format XML data is to use XSL - the Extensible Stylesheet Language - to translate XML to HTML. We'll probably cover XSL later in the course.

    
    
    
    
    
    
    
    

    Well-formed XML

    A well-formed XML document is any XML document that follows the basic rules: single root element, matched tags, unique attribute names, etc.
    
    
    
    
    
    
    
    Example: bookstore data
       <?xml version="1.0" standalone="yes"?>
       <Bookstore>
          <Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd">
             <Title>A First Course in Database Systems</Title>
             <Authors>
                <Author>
                   <First_Name>Jeffrey</First_Name>
                   <Last_Name>Ullman</Last_Name>
                </Author>
                <Author>
                   <First_Name>Jennifer</First_Name>
                   <Last_Name>Widom</Last_Name>
                </Author>
             </Authors>
          </Book>
          <Book ISBN="ISBN-0-13-031995-3" Price="$75">
             <Title>Database Systems: The Complete Book</Title>
             <Authors>
                <Author>
                   <First_Name>Hector</First_Name>
                   <Last_Name>Garcia-Molina</Last_Name>
                </Author>
                <Author>
                   <First_Name>Jeffrey</First_Name>
                   <Last_Name>Ullman</Last_Name>
                </Author>
                <Author>
                   <First_Name>Jennifer</First_Name>
                   <Last_Name>Widom</Last_Name>
                </Author>
             </Authors>
             <Remark>
             Amazon.com says: Buy this book bundled with "A First Course,"
             it's a great deal!
             </Remark>
          </Book>
       </Bookstore>
    
    A well-formed XML document can contain regular data (as above) or very irregular data.


    Valid XML

    It's possible to define a kind of schema for XML data, called a Document Type Descriptor (DTD).

    A DTD is a grammar that describes the legal attributes of tagged elements and the legal ordering and nesting of the elements.

    
    
    
    
    
    
    
    

    Example DTD

       <!ELEMENT Bookstore (Book | Magazine)*>
       <!ELEMENT Book (Title, Authors, Remark?)>
       <!ATTLIST Book ISBN CDATA #REQUIRED
                    Price CDATA #REQUIRED
                    Edition CDATA #IMPLIED>
       <!ELEMENT Magazine (Title)>
       <!ATTLIST Magazine Month CDATA #REQUIRED Year CDATA #REQUIRED> 
       <!ELEMENT Title (#PCDATA)>
       <!ELEMENT Authors (Author+)>
       <!ELEMENT Remark (#PCDATA)>
       <!ELEMENT Author (First_Name, Last_Name)>
       <!ELEMENT First_Name (#PCDATA)>
       <!ELEMENT Last_Name (#PCDATA)>
    
    The DTD is specified at the top of the document or in a separate file referenced at the top of the document. In both cases use standalone="no".

    Question: What are the benefits of using a DTD?

    
    
    
    
    
    
    
    
    
    
    
    Question: What are the benefits of not using a DTD?
    
    
    
    
    
    
    
    
    
    
    

    ID and IDREF(S) Attributes

    Element pointers: assign a special ID attribute to an element, then point to that element with a special IDREF or IDREFS attribute in another element.

    Example: reorganized bookstore

       <?xml version="1.0" standalone="no"?>
       <!DOCTYPE Bookstore SYSTEM "bookstore.dtd">
       <Bookstore>
          <Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd" Authors="JU JW">
             <Title>A First Course in Database Systems</Title>
          </Book>
          <Book ISBN="ISBN-0-13-031995-3" Price="$75" Authors="HG JU JW">
             <Title>Database Systems: The Complete Book</Title>
             <Remark>
                Amazon.com says: Buy this book bundled with
                <BookRef book="ISBN-0-13-035300-0"></BookRef>,
                It's a great deal!
             </Remark>
          </Book>
          <Author Ident="HG">
             <First_Name>Hector</First_Name>
             <Last_Name>Garcia-Molina</Last_Name>
          </Author>
          <Author Ident="JU">
             <First_Name>Jeffrey</First_Name>
             <Last_Name>Ullman</Last_Name>
          </Author>
          <Author Ident="JW">
             <First_Name>Jennifer</First_Name>
             <Last_Name>Widom</Last_Name>
          </Author>
       </Bookstore>
    
    DTD for this data:
       <!ELEMENT Bookstore (Book*, Author*)>
       <!ELEMENT Book (Title, Remark?)>
       <!ATTLIST Book ISBN ID #REQUIRED
                    Price CDATA #REQUIRED
                    Edition CDATA #IMPLIED
                    Authors IDREFS #REQUIRED>
       <!ELEMENT Title (#PCDATA)>
       <!ELEMENT Remark (#PCDATA | BookRef)*>
       <!ELEMENT BookRef EMPTY>
       <!ATTLIST BookRef book IDREF #REQUIRED>
       <!ELEMENT Author (First_Name, Last_Name)>
       <!ATTLIST Author Ident ID #REQUIRED>
       <!ELEMENT First_Name (#PCDATA)>
       <!ELEMENT Last_Name (#PCDATA)>
    

    Digression on (#P)CDATA

    The details are messy (recall comments about clunky XML and official specs above), but overall:

    XML Coverage


    Relational Model versus XML

    (First three are same as earlier)

    1. Relations: Schema must be fixed in advance
      XML: Does not require predefined, fixed schema

    2. Relations: Rigid flat table structure
      XML: Flexible hierarchical structure; graphs

    3. Relations: Easy to understand, simple query language
      XML: Can be harder to understand, more complex query language (if any)

    4. Relations: Ordering of data not relevant (tuple ordering or attribute ordering)
      XML: Ordering forced by document format, may or may not be relevant

    5. Relations: Transmission and sharing can be problematic
      XML: Designed for easy representation and exchange

    6. Relations: "Native" data model for all current serious commercial DBMSs
      XML: "Add-on," often implemented on top of relations