<!ELEMENT Bookstore (Book | Magazine)*>
<!ELEMENT Book (Title, Authors, Remark?)>
<!ATTLIST Book ISBN CDATA #REQUIRED
Price CDATA #REQUIRED
Edition CDATA #IMPLIED>
<!ELEMENT Magazine (Title)>
<!ATTLIST Magazine Month CDATA #REQUIRED Year CDATA #REQUIRED>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Authors (Author+)>
<!ELEMENT Remark (#PCDATA)>
<!ELEMENT Author (First_Name, Last_Name)>
<!ELEMENT First_Name (#PCDATA)>
<!ELEMENT Last_Name (#PCDATA)>
<?xml version="1.0" standalone="no"?>
<!DOCTYPE Bookstore SYSTEM "bookstore.dtd">
<Bookstore>
<Book ISBN="ISBN-0-13-035300-0" Price="$65" Edition="2nd">
<Title>A First Course in Database Systems</Title>
<Authors>
<Author>
<First_Name>Jeffrey</First_Name>
<Last_Name>Ullman</Last_Name>
</Author>
<Author>
<First_Name>Jennifer</First_Name>
<Last_Name>Widom</Last_Name>
</Author>
</Authors>
</Book>
<Book ISBN="ISBN-0-13-031995-3" Price="$75">
<Title>Database Systems: The Complete Book</Title>
<Authors>
<Author>
<First_Name>Hector</First_Name>
<Last_Name>Garcia-Molina</Last_Name>
</Author>
<Author>
<First_Name>Jeffrey</First_Name>
<Last_Name>Ullman</Last_Name>
</Author>
<Author>
<First_Name>Jennifer</First_Name>
<Last_Name>Widom</Last_Name>
</Author>
</Authors>
<Remark>
Amazon.com says: Buy this book bundled with "A First Course,"
it's a great deal!
</Remark>
</Book>
</Bookstore>
XPath specifies path expressions that match XML data by navigating down (and occasionally up or across) the tree.
Basic constructs (very incomplete list):
| / | root element, or separator between steps in path |
| * | matches any one element name |
| @X | matches attribute X of the current element |
| // | matches any descendant of the current element |
| [C] | evaluates condition C on the current element |
| [N] | picks the Nth matching element |
| contains(s1,s2) | returns TRUE if string s1 contains string s2 |
| name() | returns tag of the current element |
| parent:: | matches the parent of the current element |
| following-sibling:: | matches all siblings after the current node |
| descendants:: | matches any descendant of the current element |
| self:: | matches the current element |
(Example: all book titles)
(Example: all book or magazine titles)
(Example: all ISBN numbers)
(Example: all books costing < $70)
(Example: all ISBN numbers of books costing < $70)
(Example: all books containing a remark)
(Example: all titles of books costing < $70 where "Ullman" is an author)
(Example: same query using //)
(Example: all second authors anywhere)
(Example: all author last names anywhere)
(Example: all books whose title contains one of its author's last names)
(Example: all magazines where there is a book of the same title)
(Example: all books where there is a different book of the same title)
(Example: all elements whose parent tag is not "Book")
For next example modify DTD to contain Remark* instead of Remark?
(Example: all books where a Remark includes "great")
(Example: all books where all Remarks include "great")
...
Document d = parser.getDocument();
int numWords = countWordsInNode(d);
...
public static int countWordsInNode(Node node) {
int numWords = 0;
if (node.hasChildNodes()) {
NodeList children = node.getChildNodes();
for (int i = 0; i < children.getLength(); i++) {
numWords += countWordsInNode(children.item(i));
}
}
int type = node.getNodeType();
if (type == Node.TEXT_NODE) {
String s = node.getNodeValue();
numWords += countWordsInString(s);
}
return numWords;
}