CS145 Lecture Notes (1) -- Introductory Material, Data Models


This lecture material complements the assigned readings:

Database Management System (DBMS):
Provides efficient, convenient, and safe multi-user storage of and access to massive amounts of persistent data

Most familiar use: many Web sites rely heavily on DBMS's. (And you will build one!)

Examples: (solicit from class)








Non-Web examples of DBMS's: (solicit from class)







Example: Online Bookseller

Data = information on books (including categories, bestsellers, etc.), customers, pending orders, order histories, trends and preferences, etc.


Multi-user example and discussion:

Jane and John both have ID number for bookseller gift certificate (credit) of $200 they got as a wedding gift.

Jane @ her office: orders "Database Systems: The Complete Book" ($75)

  prompt user for credit ID;
  get credit from database;
  if credit >= 75 then
      credit := credit - 75;
      issue order to mail book;
      put new credit into database;
  else print "sorry"
John @ his office: orders "A First Course in Database Systems" ($65)
  prompt user for credit ID;
  get credit from database;
  if credit >= 65 then
      credit := credit - 65;
      issue order to mail book;
      put new credit into database;
  else print "sorry"
Initial credit = $200
Ending credit = ???

  • Appears similar to concurrent programming problems (synchronization, semaphores, etc.)
    BUT: data not main-memory variables

  • Appears similar to file system concurrent access
    BUT: want to control at smaller granularity

  • Also database may be distributed, replicated, etc.



    A DBMS is a Software System

    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    Major DBMS Products and Freeware

    (Fewer vendors every year...)
  • All are "relational" (or "object-relational") database systems at their core.

  • All commercial vendors also provide XML support, either built-in or built-on-top.


    People


    Less Traditional Database Applications



  • Data Models

    Terminology

    Relational Model versus XML: Fundamental Differences

    1. Relations: Schema must be fixed in advance
      XML: Does not require predefined, fixed schema

    2. Relations: Rigid flat table structure
      XML: Flexible hierarchical structure (also arbitrary graphs)

    3. Relations: Easy to understand, simple query language
      XML: Can be harder to understand, more complex query language (if any)

    Example Applications

    1. University records: students, courses, grades, etc.
      Relations or XML? Why?
      
      
      
      
      
      
    2. University Web site: news, academics, admissions, events, research, etc.
      Relations or XML? Why?
      
      
      
      
      
      
    3. Family tree.
      Relations or XML? Why?
      
      
      
      
      
      
    Application domain we use for example schemas and instances:

    The Relational Model

    Database = set of relations (or tables), each with a distinct name

    (Example: Student, Campus)

    
    
    
    
    
    
    
    Each relation has a set of attributes (or columns), with a distinct name within its relation.

    (Example: label Student, Campus)

    
    
    
    
    
    
    
    Each tuple (or row) in a relation has a value for each attribute.

    (Example: some tuples for Student, Campus relations)

    Each attribute has a type
    (or domain).
    
        ID:char(9), name:char(25), GPA:float, age:integer
    We'll use atomic (indivisible) types only for now.


    schema = complete description of structure of relations in database: relation names, attribute names, types, etc.
    instance = actual contents (tuples) of relations

    Specifying the schema (on paper):

    Naming conflicts:
        Student(ID, name, GPA, age)
        Campus(name, enrollment)      <- duplicate use of "name" okay
        Apply(ID, Campus)             <- "ID" okay, "Campus" not okay
    
    Null values: (Unknown age example)
    
    
    
    
    
    
    
    
    
    
    
    
    Keys: A key for a relation is a set of attributes such that no two tuples can have the same values for all of their key attributes. (Example: keys for Student, Campus)
    
    
    
    
    


    XML

    If you're interested in reading about the XML standard in detail, visit:
    http://www.w3.org/XML/
    
    It will keep you busy for days. A very short document linked from that site and the course schedule ("XML in 10 points") is required reading.


    XML = Extensible Markup Language Like SGML and HTML, basic XML consists of three things:
    1. Tagged elements, which may be nested within one another
    2. Attributes on elements
    3. Text

    In HTML, tags denote formatting: <Title>, <I>, <Table>, etc.

    In XML, tags denote meaning of data: <Student>, <Book_Title>, etc.

    XML data can be formatted using CSS (Cascading Style Sheets) or XSL (Extensible Stylesheet Language) to translate XML to HTML.

    Well-formed XML

    A well-formed XML document is any XML document that follows the basic rules: single root element, matched tags, unique attribute names, etc.





    Example: bookstore data
       <?xml version="1.0" standalone="yes"?>
       <Bookstore>
          <Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd">
             <Title>A First Course in Database Systems</Title>
             <Authors>
                <Author>
                   <First_Name>Jeffrey</First_Name>
                   <Last_Name>Ullman</Last_Name>
                </Author>
                <Author>
                   <First_Name>Jennifer</First_Name>
                   <Last_Name>Widom</Last_Name>
                </Author>
             </Authors>
          </Book>
          <Book ISBN="ISBN-0-13-031995-3" Price="$75">
             <Title>Database Systems: The Complete Book</Title>
             <Authors>
                <Author>
                   <First_Name>Hector</First_Name>
                   <Last_Name>Garcia-Molina</Last_Name>
                </Author>
                <Author>
                   <First_Name>Jeffrey</First_Name>
                   <Last_Name>Ullman</Last_Name>
                </Author>
                <Author>
                   <First_Name>Jennifer</First_Name>
                   <Last_Name>Widom</Last_Name>
                </Author>
             </Authors>
             <Remark>
             Amazon.com says: Buy this book bundled with "A First Course,"
             it's a great deal!
             </Remark>
          </Book>
       </Bookstore>
    
    A well-formed XML document can contain regular data (as above) or very irregular data.


    Valid XML

    It's possible to define a kind of schema for XML data, called a Document Type Descriptor (DTD).

    A DTD is a grammar that describes the legal attributes of tagged elements and the legal ordering and nesting of the elements.

    
    
    
    
    
    
    
    

    Example DTD

       <!ELEMENT Bookstore (Book | Magazine)*>
       <!ELEMENT Book (Title, Authors, Remark?)>
       <!ATTLIST Book ISBN CDATA #REQUIRED
                    Price CDATA #REQUIRED
                    Edition CDATA #IMPLIED>
       <!ELEMENT Magazine (Title)>
       <!ATTLIST Magazine Month CDATA #REQUIRED Year CDATA #REQUIRED> 
       <!ELEMENT Title (#PCDATA)>
       <!ELEMENT Authors (Author+)>
       <!ELEMENT Remark (#PCDATA)>
       <!ELEMENT Author (First_Name, Last_Name)>
       <!ELEMENT First_Name (#PCDATA)>
       <!ELEMENT Last_Name (#PCDATA)>
    
    The DTD is specified at the top of the document or in a separate file referenced at the top of the document. In both cases use standalone="no".

    Question: What are the benefits of using a DTD?

    
    
    
    
    
    
    
    
    
    
    Question: What are the benefits of not using a DTD?
    
    
    
    
    
    
    


    ID and IDREF(S) Attributes

    Element pointers: assign a special ID attribute to an element, then point to that element with a special IDREF or IDREFS attribute in another element.

    Example: reorganized bookstore

       <?xml version="1.0" standalone="no"?>
       <!DOCTYPE Bookstore SYSTEM "bookstore.dtd">
       <Bookstore>
          <Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd" Authors="JU JW">
             <Title>A First Course in Database Systems</Title>
          </Book>
          <Book ISBN="ISBN-0-13-031995-3" Price="$75" Authors="HG JU JW">
             <Title>Database Systems: The Complete Book</Title>
             <Remark>
                Amazon.com says: Buy this book bundled with
                <BookRef book="ISBN-0-13-035300-0" />,
                It's a great deal!
             </Remark>
          </Book>
          <Author Ident="HG">
             <First_Name>Hector</First_Name>
             <Last_Name>Garcia-Molina</Last_Name>
          </Author>
          <Author Ident="JU">
             <First_Name>Jeffrey</First_Name>
             <Last_Name>Ullman</Last_Name>
          </Author>
          <Author Ident="JW">
             <First_Name>Jennifer</First_Name>
             <Last_Name>Widom</Last_Name>
          </Author>
       </Bookstore>
    
    DTD for this data:
       <!ELEMENT Bookstore (Book*, Author*)>
       <!ELEMENT Book (Title, Remark?)>
       <!ATTLIST Book ISBN ID #REQUIRED
                    Price CDATA #REQUIRED
                    Edition CDATA #IMPLIED
                    Authors IDREFS #REQUIRED>
       <!ELEMENT Title (#PCDATA)>
       <!ELEMENT Remark (#PCDATA | BookRef)*>
       <!ELEMENT BookRef EMPTY>
       <!ATTLIST BookRef book IDREF #REQUIRED>
       <!ELEMENT Author (First_Name, Last_Name)>
       <!ATTLIST Author Ident ID #REQUIRED>
       <!ELEMENT First_Name (#PCDATA)>
       <!ELEMENT Last_Name (#PCDATA)>
    

    Digression on (#P)CDATA

    The details are messy (recall comments about clunky XML and official specs above), but overall:

    XML Coverage


    Relational Model versus XML

    (First three are same as earlier)

    1. Relations: Schema must be fixed in advance
      XML: Does not require predefined, fixed schema

    2. Relations: Rigid flat table structure
      XML: Flexible hierarchical structure; graphs

    3. Relations: Easy to understand, simple query language
      XML: Can be harder to understand, more complex query language

    4. Relations: Ordering of data not relevant (tuple ordering or attribute ordering)
      XML: Ordering forced by document format, may or may not be relevant

    5. Relations: Transmission and sharing can be problematic
      XML: Designed for easy representation and exchange

    6. Relations: "Native" data model for all current serious commercial DBMSs
      XML: "Add-on," often implemented on top of relations