4. Linking XML Documents

A "Review" of Graph Theory

A labeled directional graph (digraph) is a network of nodes connected by arcs. Any type of object can be a node. An arc is simply an arrow that connects a source node to a destination node. Both nodes and arcs can be labeled. A label can be regarded as the name of the node or arc or as information associated with the node or arc. Digraphs can be represented graphically. The following digraph consists of five nodes: R1 through R5 and seven arcs: A1 through A7:

Digraphs are useful for representing binary relationships between objects. For example two binary relationships over the Person domain are:

X is the child of Y
X is the spouse of Y

If persons are represented as nodes, then arcs can represent instances of the child and spouse relationships between these nodes. Here's an example worked out for Simpsons domain:

As another example, consider the binary relationship over the domain of cities:

City X is connected to City Y

Here's a digraph that connects a few California cities. This digraph employs the somewhat risky shortcut of representing two arcs pointing in opposite directions as a single bi-directional arc:

(There's an entire discipline in Mathematics called Graph Theory-- remember the Four-Color Problem? Graphs are an important family of data structures in Computer Science-- remember the Traveling Salesman Problem?)

Representing digraphs in XML: Extended Links

XLink is an extension of XML that allows XML authors to create simple links from one XML document to another. A simple link is very similar to an HTML hyperlink.

XLink also allows authors to create XML elements called extended links. An extended link is an XML element that describes a resource digraph. The nodes of a resource digraph are web resources. Anything that has a URI is a web resource: namespaces, XML documents, HTML documents, text files, databases, web applications, web services, etc. Even elements within an XML document can be resources (if we extend URI with XPath expressions).

XLink is not an XML vocabulary with elements, attributes, and a DTD. Instead, XLink is a collection of universal attributes that can be added to the elements of any XML vocabulary.

Example: Computer Networks

For example, if we want to declare that an XML element

<network>
   <!-- descriptions of the computers goes here -->
</network>

describes a resource digraph, then we only need to add the XLink type attribute with value set to "extended":

<network xlink:type = "extended">
   <!-- descriptions of the computers goes here -->
</network>

Of course the xlink namespace needs to be declared in the root element of the XML document:

<?xml version = "1.0"?>
<networks xmlns:xlink = "http://www.w3.org/1999/xlink">
   <!-- descriptions of the networks goes here -->
</networks>

Let's describe the following computer network in which three computers: C1, C2, and C3 are connected by four wires: W1, W2, W3, W4 as shown:

We might represent this network using a network element containing sub-elements that describe the computers and the wires:

<network xlink:type = "extended">
   <!-- computers: -->
   <computer name = "C1">
      <!-- more info here -->
   </computer>
   <computer name = "C2"> 
      <!-- more info here -->
   </computer>
   <computer name = "C3">
      <!-- more info here -->
   </computer>
   <!-- wires: -->
   <wire name = "W1">
      <!-- more info here -->
   </wire>
   <wire name = "W2">
      <!-- more info here -->
   </wire>
   <wire name = "W3">
      <name> W3 </name>
      <!-- more info here -->
   </wire>
   <wire name = "W4">
      <!-- more info here -->
   </wire>
</network>

An XLink processor will recognize that computers are resources if we give them XLink type attributes with value set to "resource" or "locator". An element with XLink type "locator" is called a remote resource. An element with XLink type "resource" is called a local resource. A remote resource is simply a URI reference to the actual resource. A local resource is an element within the extended link that is the resource itself.

For example, assume XML documents that describe computers C1 and C2 have the URIs:

http://www.demo.com/computer/c1.xml
http://www.demo.com/computer/c2.xml

Sadly, there is no such description of C3. Instead, the only information available about C3 is contained in the corresponding network/computer element. In other words, c1.xml and c2.xml are remote resources, while the computer element named C3 is a local resource.

Each resource node, local or remote, should have an XLink label so that an XLink processor can identify it. (An XLink processor wouldn't know to identify the name element as the label.) The locator node must also supply the URI of the remote resource:

<network xlink:type = "extended">
   <!-- computers: -->
   <computer name = "C1"
      xlink:type = "locator"
      xlink:href = "http://www.demo.com/computer/c1.xml"
      xlink:label = "C1"
>
      <!-- more info here -->
   </computer>
   <computer name = "C2"
      xlink:type = "locator"
      xlink:href = "http://www.demo.com/computer/c2.xml"
      xlink:label = "C2"
>  
      <!-- more info here -->
   </computer>
   <computer name = "C3"
      xlink:type = "resource"
      xlink:label = "C3"
>
      <!-- more info here -->
   </computer>
   <!-- wires: -->
   <wire name = "W1">
      <!-- more info here -->
   </wire>
   <wire name = "W2">
      <!-- more info here -->
   </wire>
   <wire name = "W3">
      <!-- more info here -->
   </wire>
   <wire name = "W4">
      <!-- more info here -->
   </wire>
</network>

Similarly, an XLink processor won't recognize a wire as an arc unless we specify the XLink type attribute of wire to be arc.

<network xlink:type = "extended">
   <!-- computers: -->
   <computer name = "C1"
      xlink:type = "locator"
      xlink:href = "http://www.demo.com/computer/c1.xml"
      xlink:label = "C1">
      <!-- more info here -->
   </computer>
   <computer name = "C2"
      xlink:type = "locator"
      xlink:href = "http://www.demo.com/computer/c2.xml"
      xlink:label = "C2">
      <!-- more info here -->
   </computer>
   <computer name = "C3"
      xlink:type = "resource"
      xlink:label = "C3">
      <!-- more info here -->
   </computer>
   <!-- wires: -->
   <wire name = "W1"
      xlink:type = "arc"
      xlink:from = "C1"
      xlink:to = "C2">
      <!-- more info here -->
   </wire>
   <wire name = "W2"
      xlink:type = "arc"
      xlink:from = "C2"
      xlink:to = "C1"
>
      <!-- more info here -->
   </wire>
   <wire name = "W3"
      xlink:type = "arc"
      xlink:from = "C2"
      xlink:to = "C3"
>
      <!-- more info here -->
   </wire>
   <wire name = "W4"
      xlink:type = "arc"
      xlink:from = "C3"
      xlink:to = "C2"
>
      <!-- more info here -->
   </wire>
</network>

Note: Arcs can't be labeled in XLink.

Terminology

Our terminology is somewhat different from XLink terminology, where Resource Digraphs are called extended links. The resources in a resource digraph are called participants.

The source node of an arc is called a starting resource. The destination node is called an ending resource.

An outbound arc leads from a local resource to a remote resource. An inbound arc leads from a remote resource to a local resource. A third-party arc connects two remote resources.

A document containing a collection of third-party and inbound arcs is called a linkbase.

XLink Attribute Patterns

The following table shows which attributes are required (R) or optional (O) for the corresponding XLink type:

 

simple

extended

locator

arc

resource

title

type

R

R

R

R

R

R

href

O

 

R

 

 

 

role

O

O

O

 

O

 

arcrole

O

 

 

O

 

 

title

O

O

O

O

O

 

show

O

 

 

O

 

 

actuate

O

 

 

O

 

 

label

 

 

O

 

O

 

from

 

 

 

O

 

 

to

 

 

 

O

 

 

Semantic Attributes: Role, arcrole, title

Roles can be used with resources and links. A role is a URI that describes some property.

Behavior Attributes: Show and Actuate

Behavior attributes are optional for simple links and arcs. They specify the action to be performed when an arc is traversed.

xlink:show = "XXXX"
xlink:actuate = "YYYY"

XXXX = new (open in new window)
    | replace
    | embed
    | other
    | none
 
YYYY = onRequest (traverse on click)
    | onLoad (traverse this arc on load)
    | other
    | none

Simple Links

A simple link links an element within a document (called a local resource) to another document (called the remote resource).

Remote Resource: sjsu.xml

Assume the following XML document has been defined:

<?xml version = "1.0"?>
<about
   xmlns:xlink = "http://www.w3.org/1999/xlink">
   <name> San Jose State University </name>
   <description> A state university </description>
   <location>
      Downtown San Jose
   </location>
</about>

Local Resource: pearce.xml

A local resource is any element that contains XLink attributes linking it to a remote resource:

<?xml version = "1.1"?>
<about xmlns:xlink = "http://www.w3.org/1999/xlink">
   <name> Jon Pearce </name>
   <description> Professor </description>
   <location
      xlink:type = "simple"
      xlink:href = "sjsu.xml"
>
      San Jose State University
   </location>
</about>

Example: Object Serialization

Example: Concept Maps

XPointer

XPointer is used to reference elements within a remote XML document by combining a URL with an XPATH expression:

<?xml version = "1.1"?>
<about xmlns:xlink = "http://www.w3.org/1999/xlink">
   <name> Jon Pearce </name>
   <description> Professor </description>
   <location
      xlink:type = "simple"
      xlink:href = "sjsu.xml#xpointer(/about/location)">
      San Jose State University
   </location>
</about>

XInclude

An include element is any empty element that has XInclude attributes that reference a remote document. This element will be replaced by the remote document. XPointer can be used to include elements of the remote document.

<?xml version = "1.1"?>
<about xmlns:xlink = "http://www.w3.org/1999/xlink">
   <name> Jon Pearce </name>
   <description> Professor </description>
   <location
      xlink:type = "simple"
      xlink:href = "sjsu.xml">
      San Jose State University
   </location>
   <resume
      xmlns:xinclude = "http://www.w3.org/1999/XML/xinclude"
      xinclude:href = "resume.xml"
      xinclude:parse = "xml"
   />

</about>

XBase

An XBase attribute allows us to easily change the base of a reference to a remote document:

<?xml version = "1.1"?>
<about xmlns:xlink = "http://www.w3.org/1999/xlink">
   <name> Jon Pearce </name>
   <description> Professor </description>
   <location
      xml:base = "http://sjsu.edu/"
      xlink:type = "simple"
      xlink:href = "sjsu.xml">
      San Jose State University
   </location>
</about>