8. Navigating XML Documents

The XSL Tree Model

Before an XML document is actually transformed by a style sheet, it is first parsed into a tree-like structure. The actual transformation is performed on this tree.

The format of an XSL tree is partly specified by the XSLT standard, partly by the XPATH standard, and is similar to the tree structure defined by the DOM standard.

A node in an XSL tree has the form:

There are seven types of nodes:

Root
Element
Text
Attribute
Comment
Processing Instruction (PI)
Namespace

We can regard nodes as objects and types as classes. Here's a UML diagram showing the "is-a" and "has-a" relationships between these classes:

Every node has a name of type String:

The name of a root, comment, or text node is the empty string.

The name of an element node is its tag, qualified by its namespace URI

The name of a namnespace node is the local name of the URI

The name of a PI is its target

Every node has a value of type String:

The value of a comment is the comment without its delimeters

The value of a text node is the text

The value of a PI is the data part of the PI, if any

The value of an attribute is its attribute value

The value of an element or root node is the concatonation of the values of all of its text and element children. (We denote this value by *.)

For example, the following XML file contains a style sheet processing instruction followed by its root element:

<?xml version = "1.0"?>
<?xml:stylesheet type = "text/xsl" href = "outline.xsl"?>
<examples xmlns:java = "lang">
   <!-- control structures -->
   <java:example number = "1.0">
      <description>
         The while loop.
      </description>
      <![CDATA[
         while(CONDITION) STATEMENT
      ]]>
   </java:example>
</examples>

Here is the corresponding XSL tree:

Note 1: The namespace declaration of the root element (i.e., <examples>, not to be confused with the root node) is considered to be a namespace node, not an attribute node.

Note 2: The value of a PI of the form:

<? print Hello ?>

would be Hello, while its name would be print.

Note 3: The <?xml ... ?> element is not in the tree. In particular, it is not considered to be a PI node.

Note 4: Not all of the information that's in the original XML document is in the tree. For example, the fact that one of the text nodes was an unparsed CDATA node does not show up in the tree. Only core information items are guaranteed to appear. (As opposed to non-core and lexical items. See the xml-infoset.)

XPath Expressions

Recall that an XSLT processor evaluates XPATH expressions relative to the context supplied by the source tree.

value = processor.eval(exp, context);

There are three types of expressions:

<Expression> ::= <Operation> | <Location> | <Primary>

Operations involve standard infix binary and prefix unary operators (+, and, =, -, etc.)

Primary expressions are literals (numbers, strings), variable references, function calls, etc.

The value of an XPATH expression might be a number, Boolean, string, or node set. For example, assume pi is a variable defined to be 3.1416. Then the values of the following expressions:

$pi * 5
$pi &lt; 5 and true()
$pi != 5 or not(nuts) and ($pi + 3) = -2

are 15.708, true, and true, respectively.

Note 1: "nuts" is a literal string and is equated with false.

Note 2: Therefore true must be a function, not a string.

Note 3: We can't write "$pi < 5"

Note 4: Expressions may contain calls to standard functions.

The value of a location is a source tree node set.

Absolute Locations

A location is a sequence of steps:

/STEP/STEP/STEP/etc

A location describes a set of nodes in the source tree. The simples type of location describes a path from the root node.

For example, assume the source document has the form:

<A>
   <B> <C prop = "p1"> c1 </C> <C prop = "p2"> c2 </C> </B>
   <B> <D> d1 </D> <C prop = "p3"> c3 </C> </B>
</A>

Here's a template we can use to evaluate XPATH expressions:

<xsl:template match = "/">

   <xsl:variable name = "path" select = "/" />

   <html>
      <head> <title> XSL Tests </title> </head>
      <body>

         <xsl:for-each select = "$path">
            Name = <xsl:value-of select = "name(.)" /> <br />
            Value = <xsl:value-of select = "." /> <br /> <br />
         </xsl:for-each>
      </body>
   </html>

</xsl:template>

Here is the output produced when path = "/"

Name =
Value = c1 c2 d1 c3

The root node has no name, and its value is the concatonation of the values of all of its children.

Here is the output produced when path = "/A"

Name = A
Value = c1 c2 d1 c3

Here is the output produced when path = "/A/B"

Name = B
Value = c1 c2

Name = B
Value = d1 c3

Here's the output when path = "/A/B/C"

Name = C
Value = c1

Name = C
Value = c2

Name = C
Value = c3

We can also include attributes in our path. For example, here's the output when path = "/A/B/C/@prop"

Name = prop
Value = p1

Name = prop
Value = p2

Name = prop
Value = p3

Using Predicates

Predicates can be used to filter out unwanted nodes. Only nodes that pass the test specified by the predicate will be included.

Here's the output when path = "/A/B[2]"

Name = B
Value = d1 c3

When path = "/A/B/C[@prop = 'p3']"

Name = C
Value = c3

Predicates may contain function calls. Here's the output when
path = "/A/B/C[contains(., 'c1')]"

Name = C
Value = c1

Note that the call to contains is needed because the value of the node contains whitespace characters.

Relative Locations

If we modify our template so that it matches B nodes, then we can use relative path expressions. These are paths that don't begin with a slash.

<xsl:template match = "B">

   <xsl:variable name = "path" select = "C" />

   <html>
      <head> <title> XSL Tests </title> </head>
      <body>

         <xsl:for-each select = "$path">
            Name = <xsl:value-of select = "name(.)" /> <br />
            Value = <xsl:value-of select = "." /> <br /> <br />
         </xsl:for-each>
      </body>
   </html>

</xsl:template>

Here's the output produced when path = "C"

Name = C
Value = c1

Name = C
Value = c2

Name = C
Value = c3

We can use . and .. in path expressions. Here's the output when path = ".."

Name = A
Value = c1 c2 d1 c3

Name = A
Value = c1 c2 d1 c3

A got printed twice, because the pattern "B" selected two nodes.

Specifying the Axis

Format of a step is:

AXIS::TEST[PRED]

For the remainder of the section we will use the following XML file:

<?xml version = "1.0"?>
<A>
   <B> <B1> B1 </B1> <B2> B2 </B2> </B>
   <X prop1 = "p" prop2 = "q">
     <C prop1 = "p1"> <C1> C1 </C1> <C2> C2 </C2> </C>
     <D prop2 = "p2"> <D1> D1 </D1> <D2> D2 </D2> </D>
   </X>
   <E> <E1> E1 </E1> <E2> E2 </E2> </E>
</A>

Here's is a sketch of the tree, showing the current node:

Our style sheet makes the shaded node, X, our current node, then examines various axes:

<xsl:template match = "/">
   <html>
      <head> <title> XSL Tests </title> </head>
      <body>
         <xsl:apply-templates select = "A/X" />
      </body>
   </html>
</xsl:template>

<xsl:template match="X">
   <xsl:for-each select = "self::*">
      Name = <xsl:value-of select = "name(.)" /> <br />
      Value = <xsl:value-of select = "." /> <br /> <br />
   </xsl:for-each>
</xsl:template>

Here are the results:
child

Name = C
Value = C1 C2

Name = D
Value = D1 D2

self (.)

Name = X
Value = C1 C2 D1 D2

parent (..)

Name = A
Value = B1 B2 C1 C2 D1 D2 E1 E2

ancestor

Name = A
Value = B1 B2 C1 C2 D1 D2 E1 E2

ancestor-or-self

Name = A
Value = B1 B2 C1 C2 D1 D2 E1 E2

Name = X
Value = C1 C2 D1 D2

descendant-or-self (//)

Name = X
Value = C1 C2 D1 D2

Name = C
Value = C1 C2

Name = C1
Value = C1

Name = C2
Value = C2

Name = D
Value = D1 D2

Name = D1
Value = D1

Name = D2
Value = D2

descendant

Name = C
Value = C1 C2

Name = C1
Value = C1

Name = C2
Value = C2

Name = D
Value = D1 D2

Name = D1
Value = D1

Name = D2
Value = D2

attribute (@*)

Name = prop1
Value = p

Name = prop2
Value = q