CS145 Lecture Notes (5) -- XML Programming: XPath, SAX, DOM

XML DTD and sample data for examples

   <!ELEMENT Bookstore (Book | Magazine)*>
   <!ELEMENT Book (Title, Authors, Remark?)>
             Price CDATA #REQUIRED
             Edition CDATA #IMPLIED>
   <!ELEMENT Magazine (Title)>
   <!ELEMENT Title (#PCDATA)>
   <!ELEMENT Authors (Author+)>
   <!ELEMENT Remark (#PCDATA)>
   <!ELEMENT Author (First_Name, Last_Name)>
   <!ELEMENT First_Name (#PCDATA)>
   <!ELEMENT Last_Name (#PCDATA)>

   <?xml version="1.0" standalone="no"?>
   <!DOCTYPE Bookstore SYSTEM "bookstore.dtd">
      <Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd">
         <Title>A First Course in Database Systems</Title>
      <Book ISBN="ISBN-0-13-031995-3" Price="$75">
         <Title>Database Systems: The Complete Book</Title>
         Amazon.com says: Buy this book bundled with "A First Course,"
         it's a great deal!


Think of XML as a tree (or directory) structure.

XPath specifies path expressions that match XML data by navigating down (and occasionally up or across) the tree.

Basic constructs (very incomplete list):

/ root element, or separator between steps in path
* matches any one element name
@X matches attribute X of the current element
// matches any descendant of the current element
[C] evaluates condition C on the current element
[N] picks the Nth matching element
contains(s1,s2) returns TRUE if string s1 contains string s2
name() returns tag of the current element
parent:: matches the parent of the current element
following-sibling:: matches all siblings after the current node
descendants:: matches any descendant of the current element
self:: matches the current element


(Example: all book titles)

(Example: all book or magazine titles)

(Example: all ISBN numbers)

(Example: all books costing < $70)

(Example: all ISBN numbers of books costing < $70)

(Example: all books containing a remark)

(Example: all titles of books costing < $70 where "Ullman" is an author)

(Example: same query using //)

(Example: all second authors anywhere)

(Example: all author last names anywhere)

(Example: all books whose title contains one of its author's last names)

(Example: all magazines where there is a book of the same title)

(Example: all books where there is a different book of the same title)

(Example: all elements whose parent tag is not "Book")

For next example modify DTD to contain Remark* instead of Remark?

(Example: all books where a Remark includes "great")

(Example: all books where all Remarks include "great")


(Example: count all words in an XML document)

Document d = parser.getDocument();
int numWords = countWordsInNode(d);

  public static int countWordsInNode(Node node) {
    int numWords = 0;
    if (node.hasChildNodes()) {
      NodeList children = node.getChildNodes();
      for (int i = 0; i < children.getLength(); i++) {
        numWords += countWordsInNode(children.item(i));

    int type = node.getNodeType();
    if (type == Node.TEXT_NODE) {
      String s = node.getNodeValue();
      numWords += countWordsInString(s);
    return numWords;  

(Pseuedocode Example: get all ISBNs)