SAX (Simple API for XML)

Building and traversing a large DOM tree can waste time and memory.

For this reason a second standard API exists for event-based XML parsers.

A SAX parser traverses an XML document without building a tree.

Every time a SAX parser enters an element node, it calls the startElement method of a given event handler. When it exits the node the handler's endElement method is called. When a text element is encountered, the handler's characters method is called.

Java provides a DefaultHandler class that implements these and other methods as empty operations.

Programmers simply create extensions of the DefaultHandler class and override those methods of interest.

Here's the basic design:

Using the SAX Parser

The static visit method in SAXUtils.java demonstrates how an XML file is parsed using a SAX parser.

Each time an element node is entered by the parser, the startElement method of the handler is called. When the element node is exited, the endElement node is called.

When a text node is encountered, the handler's characters method is called.

These methods are no-ops in the default handler, but they may be selectively overridden in subclasses.

A Pretty Print Handler

Pretty printing an XML document shouldn't require building a DOM tree. This is a job for SAX.

The file

PrettyPrintHandler.java

Shows how the DefaultHandler is typically extended.

Here's a simple test driver:

public class TestSAX {
   public static void main(String[] args) {
      SAXUtils.visit("org1.xml", new PrettyPrintHandler());
   }
}

Recall that org1.xml represents the members of an organization and their dependants. (See org0.xml for a formatted version.)

Here's the output produced:

.org
..member
...id = p1
...gender = male
...lastName
....Simpson
...firstName
....Homer
...dob
....1952-07-04
...spouse
....id = p2
...child
....id = p3
...child
....id = p4
...child
....id = p5
..dependant
...id = p2
...gender = female
...lastName
....Simpson
...firstName
....Marge
...dob
....1955-10-20
...sponsor
....id = p1
..dependant
...id = p3
...gender = female
...lastName
....Simpson
...firstName
....Lisa
...dob
....1985-06-22
...sponsor
....id = p1
..dependant
...id = p4
...gender = female
...lastName
....Simpson
...firstName
....Maggie
...dob
....1988-11-11
...sponsor
....id = p1
..dependant
...id = p5
...gender = male
...lastName
....Simpson
...firstName
....Bart
...dob
....1983-01-01
...sponsor
....id = p1

Finding Sponsors

In this example we generate a table that displays all dependants and their sponsors.

Here's the test harness:

public class TestSAX {
   public static void main(String[] args) {
      DependantHandler handler = new DependantHandler();
      SAXUtils.visit("org1.xml", handler);
      handler.displaySponsors();
   }
}

Here's the output it produces:

Maggie Simpson sponsor = Homer Simpson
Marge Simpson sponsor = Homer Simpson
Bart Simpson sponsor = Homer Simpson
Lisa Simpson sponsor = Homer Simpson

The handler creates two tables, one that associates member id numbers with member names (first and last), and another that associates dependant names (first and last) with the id numbers of their sponsors. The two tables are merged together to form a table that associates dependant names with the names of their sponsors:

   private Map<String, String> members = new Hashtable<String, String>();
   private Map<String, String> dependants = new Hashtable<String, String>();

    // merge tables
   public Map<String, String> getSponsors() {
      Map<String, String> result = new Hashtable<String, String>();
      Set<String> dependantNames = dependants.keySet();
      for(String dn: dependantNames) {
         String sponsorName = members.get(dependants.get(dn));
         result.put(dn, sponsorName);
      }
      return result;
   }

This example tests the limits of SAX because it ends up creating two potentially large tables. As we shall see, the tables replace the backtracking that would be available in DOM to fetch the names of sponsors from their IDREFs.

Here's the complete implementation:

DependantHandler.java