=========================================================================
LECTURE NOTES - XML
=========================================================================
If you are interested in reading about the XML standard, please visit:
http://www.w3.org/XML/
It will keep you busy for days. A very short web page linked to that
site ("XML in 10 points") provides a light summary.
XML = Extensible Markup Language
-> A (relatively) new standard for data representation and exchange on
the internet
-> A document format: a superset of HTML, a subset of SGML (roughly)
-> XML is to data what Java is to programming
Like SGML and HTML, basic XML consists of three things:
(1) Tagged elements, which may be nested within one another
(2) Attributes on elements
(3) Text
In HTML, tags denote formatting:
, , , etc.
In XML, tags denote meaning of data: , , etc.
(To format XML data, use XSL - the Extensible Stylesheet Language - to
translate XML to HTML.)
Well-formed XML
===============
A well-formed XML document is any XML document that follows the basic
rules: matched tags, unique attribute names, etc.
Ex: bookstore data
A First Course in Database Systems
Jeffrey
Ullman
Jennifer
Widom
Database System Implementation
Hector
Garcia-Molina
Jeffrey
Ullman
Jennifer
Widom
Buy this book bundled with "A First Course", it's a great deal!
A well-formed XML document can contain regular data (as above) or very
irregular data.
Valid XML
=========
It is possible to define a "schema" for XML data, called a Document
Type Descriptor (DTD).
A DTD is a grammar that describes the legal nesting of tags and
attributes.
]>
The DTD is specified at the top of the document or in a separate file
referenced at the top of the document. In both cases use
STANDALONE="no".
Q: What are the benefits of using DTDs?
ID and IDREF(S) Attributes
==========================
Element pointers: assign a special "ID" attribute to an element, then
point to that element with a special "IDREF(S)" attribute in another
element.
Ex: reorganized bookstore
A First Course in Database Systems
Database System Implementation
Buy this book bundled with
It's a great deal!
Hector
Garcia-Molina
Jeffrey
Ullman
Jennifer
Widom
DTD for this data:
]>
Querying XML
============
-> XML turns the Web into one big database
Several languages have been proposed for querying XML data.
- We developed one at Stanford called "Lorel", based on OQL.
- There is a recently proposed standard called XQuery.
- There is also a simpler standard (part of XQuery, XSL, and XPointer)
called XPath.
- All languages are based on navigating through the structure
of the XML document.
Ex: Find the titles of books costing < $60 where Ullman is an author
(based on the first XML data)
In Lorel:
SELECT b.TITLE
FROM BOOKSTORE.BOOK b
WHERE b.@PRICE < $60 AND b.AUTHORS.AUTHOR.LASTNAME = "Ullman"
In XPath:
BOOKSTORE/BOOK[@PRICE<60, AUTHORS/AUTHOR/LASTNAME="Ullman"]/TITLE
In XQuery:
FOR $b IN BOOKSTORE/BOOK
WHERE $b/@PRICE < 60 AND $B/AUTHORS/AUTHOR/LASTNAME="Ullman"
RETURN
$b/TITLE
XML query languages often include "wildcards" and regular expression
operators for cases when exact structure of data may be unknown.
EX: Find all titles anywhere in the bookstore containing "XML".
In Lorel:
SELECT t
FROM BOOKSTORE.#.TITLE t
WHERE t LIKE "%XML%"
Million (billion actually) dollar question
==========================================
* Will people store their data in XML, or only use XML as a transport
format for data stored in conventional database systems?