The specification for XHTML can be found at:
http://www.w3.org/TR/2002/REC-xhtml1-20020801/
XHTML is an XML version of HTML.
Like HTML documents, XHTML documents have an .html or .htm suffix. The prelude contains an XML declaration and a DTD declaration. The <html> root element of an XHTML document must contain the XHTML namespace declaration:
<?xml version="1.0"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0
Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns = "http://www.w3.org/1999/xhtml">
<head><title> This is my
first XHTML document </title><head>
<body>
<p> This is my first XHTML
document </p>
</body>
</html>
XHTML elements are the same as HTML elements: tables, forms, images, text, scripts, and of course, links. (Of course these elements belong to the XHTML namespace.)
However, like all XML documents, an XHTML document must be well formed. This means XHTML documents have restrictions that HTML documents don't need to worry about. Here's a list of these restrictions:
docs must be well-formed
element and attribute names must be lower case
attribute values must be quoted
non-empty elements must have end tags
start tag of an empty element must end with />
MathML is an XML language for marking up mathematical expressions. The specification for mathML can be found at:
http://www.w3c.org/TR/2003/REC-MathML2-20031021/
A MathML expression is an element of the form:
<math xmlns="http://www.w3.org/1998/Math/MathML">
<!-- expression components go here
-->
</math>
MathML has three types of elements: presentation, content, and interface. Presentation elements allow us to represent mathematical expressions symbolically, while content elements allow us to represent expressions logically. Most processors don't fully support content elements, so we will concentrate on presentation elements.
The primitive MathML expressions are numbers <mn>, identifiers <mi>, and operator symbols <mo>.
New MathML expressions can be built from existing expressions by sequencing and nesting. For example, the expression "x + 22 = 100" is simply a sequence of primitive expressions:
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mi>x</mi>
<mo>+</mo>
<mn>22</mn>
<mo>=</mo>
<mn>100</mn>
</math>
More complex MathML expressions include roots, fractions, superscripted expressions, subscripted expressions, under-scripted expressions, matrices, and expressions bracketed by parentheses.
Here's a markup for the Pythagorean theorem:
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mi>c</mi>
<mo>=</mo>
<msqrt>
<msup>
<mi>x</mi>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mi>y</mi>
<mn>2</mn>
</msup>
</msqrt>
</math>
If we attempt to embed a MathML expression in an ordinary HTML document, the browser simply ignores the MathML tags it doesn't understand. In other words, it displays the text:
c = x2 + y2
On the other hand, if we embed a MathML expression in an XHTML document, then a browser equipped with the right plugin (for example, MathPlayer from:
http://www.dessci.com/en/products/mathplayer/)
or a browser with a native ability to understand MathML such as Amaya from:
can properly display both the XHTML elements and the MathML elements.
Here's an example of embedding MathML in XHTML:
<html xmlns = "http://www.w3.org/1999/xhtml">
<!-- content goes here -->
<head><title> My First
XHTML document </title></head>
<body>
<p> Here's the Pythagorean
theorem.</p>
<p>
<math
xmlns="http://www.w3.org/1998/Math/MathML">
<mi>c</mi>
<mo>=</mo>
<msqrt>
<msup>
<mi>x</mi>
<mn>2</mn>
</msup>
<mo>+</mo>
<msup>
<mi>y</mi>
<mn>2</mn>
</msup>
</msqrt>
</math>
</p>
</body>
</html>
Here is a screen shot of Amaya:
SVG is an XML markup language for standard vector graphics. The specification for SVG can be found at:
http://www.w3c.org/TR/2003/REC-SVG11-20030114/
Of course we can embed SVG elements in XHTML documents. Enabled browsers can then display the graphical elements. Here are a few examples of SVG elements:
<svg xmlns="http://www.w3.org/2000/svg"
version="1.0" height="74px">
<rect stroke="black"
fill="none" y="0px" x="1px"
width="173px"
height="74px"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg"
version="1.0">
<ellipse stroke="black"
fill="none" cy="45px" cx="63px"
rx="63px"
ry="45px"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg"
version="1.0">
<line stroke="black"
y1="15px" x1="12px" x2="168px"/>
<line stroke="black"
y1="15px" x1="205px" x2="6px"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg"
version="1.0">
<polyline stroke="black"
fill="none" points="0,15 38,0 89,14 133,0
241,15 129,15 174,0 174,0
174,0"
transform="translate(34,0)"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg"
version="1.0">
<path stroke="black"
fill="none" d="M 26,1 C 25,3 72,14 88,15 C
114,16 174,0 201,0 C 224,0 277,13
301,15 C 301,15 301,15 301,15"/>
</svg>
<svg xmlns="http://www.w3.org/2000/svg"
version="1.0" height="69px">
<g>
<rect stroke="black"
fill="none" y="0px" x="8px"
width="120px"
height="57px"/>
<circle stroke="black"
fill="none" cy="56px"
cx="83px"r="12px"/>
</g>
</svg>
Here's a screen shot of Amaya displaying an XHTML document containing the above SVG elements:
Syndication is a common XML application, and RSS is a public XML syndication language.
News agencies such as Reuters, Associated Press (AP), and United Press International (UPI) provide RSS files on their web sites called feeds or channels. An RSS feed is simply a list of news items. Each item might have a title, a description or teaser, and a link to the actual article:
<item>
<title>Earth
Invaded</title>
<link>http://news.example.com/2004/12/17/invasion</link>
<description>The earth was
attacked by an invasion fleet
from halfway across the galaxy;
luckily, a fatal
miscalculation of scale resulted in the
entire armada
being eaten by a small
dog.</description>
</item>
A
web portal such as Yahoo contains a box titled Today's Top Stories. In this box
we see the titles, descriptions, and links for late-breaking stories from
Reuters, AP, and UPI. The contents of this box are generated by an aggregator.
Of
course the aggregator controls how items from the various feeds are displayed,
while the news agencies control the actual content of their stories. Users can
get stories from many different sources without having to actually visit many
different sites. Users can even customize aggregators to only display stories
that are on topics that are of interest: foreign affairs, economy, and Pamela
Anderson, for example.
As
another example, Acme Inc. provides RSS feeds for product announcements and for
job openings. An employment agency might aggregate job opening feeds from Acme
and several other employers, while a retail outlet for bear traps might
aggregate product announcement feeds from Acme and several other bear trap
providers.
A
weblog (blog) is a public online diary. A blog author might provide an RSS feed
that lists juicy excerpts. An aggregator publishes these feeds so interested
readers can follow along.
There
are several versions of RSS. RSS 0.9x stands for Really Simple Syndication. It
was developed by Netscape and is now in the hands of UserLand Software (http://rss.userland.com/).
RSS
1.0 is maintained by an ad-hoc group of interested people (http://web.resource.org/rss/1.0/).
It incorporates RDF (Resource Description Framework) to markup its meta data.
It's a more complicated language that many regard as distinct from RSS 0.9x. In
fact, RSS 1.0 stands for RDF Site Summary.
My
information about RSS comes from a good online tutorial located at http://www.mnot.net/rss/tutorial/.
Here's
a simple example of an RSS 0.91 feed:
<?xml
version="1.0"?>
<rss version="0.91">
<channel>
<title>Example
Channel</title>
<link>http://example.com/</link>
<description>My example
channel</description>
<item>
<title>News for September
the Second</title>
<link>http://example.com/2002/09/01</link>
<description>other things
happened today</description>
</item>
<item>
<title>News for September
the First</title>
<link>http://example.com/2002/09/02</link>
</item>
</channel>
</rss>
Parsing XML documents in a Java or C++ program is pretty easy to do. For this reason XML can be used to parameterize how a program should configure itself when it starts up. An XML configuration document might describe user preferences and GUI layout.
Ant (http://ant.apache.org/) is a build tool from the Apache Foundation that replaces the old make utility from Unix. Recall that make searched for a makefile that described dependencies between the different files of a program. It automatically recompiled all files that changed since the last build. It also recompiled all of the files that depended on these files. Finally, it linked together all indicated object files into an executable. Ant performs a similar function, except the makefiles are XML documents. Here's an example:
<project name="MyProject" default="dist"
basedir=".">
<description>
simple example build file
</description>
<!-- set global properties for this
build -->
<property name="src"
location="src"/>
<property name="build"
location="build"/>
<property name="dist" location="dist"/>
<target name="init">
<!-- Create the time stamp -->
<tstamp/>
<!-- Create the build directory
structure used by compile -->
<mkdir
dir="${build}"/>
</target>
<target name="compile"
depends="init"
description="compile the
source " >
<!-- Compile the java code from
${src} into ${build} -->
<javac srcdir="${src}"
destdir="${build}"/>
</target>
<target name="dist"
depends="compile"
description="generate the
distribution" >
<!-- Create the distribution
directory -->
<mkdir
dir="${dist}/lib"/>
<!-- Put everything in ${build}
into the MyProject-${DSTAMP}.jar file -->
<jar
jarfile="${dist}/lib/MyProject-${DSTAMP}.jar"
basedir="${build}"/>
</target>
<target name="clean"
description="clean up"
>
<!-- Delete the ${build} and
${dist} directory trees -->
<delete
dir="${build}"/>
<delete
dir="${dist}"/>
</target>
</project>
ODP is a reference architecture for systems constructed from distributed, heterogeneous components. Components communicate by passing messages. A component has one or more unique identifiers. A component is either a service requestor, a service provider, or both. A service provider component implements one or more interfaces. Associations between interfaces and the identifiers of implementing components are maintained by registries. A registry is like the yellow pages section of a phone book. Using a registry, a client component can dynamically locate and interact with the provider components it needs. This is called discovery. (See http://www.joaquin.net/ODP/).
There are several notable implementations of the ODP architecture. The best known is the Object Management Group's (http://www.omg.org/) Object Management Architecture or OMA. The OMA reference architecture envisions layers of applications, services, and facilities, all written in different Object-Oriented languages. Remote method invocation is provided by an object request broker or ORB. (In fact OMA better known as CORBA-- Common Object Request Broker Architecture).
Microsoft's COM (Component Object Model) architecture implemented ODP for programs running on the same Windows platform. This enabled the popular Object Linking and Embedding (OLE) technology. The DCOM extension allowed OLE to reach across networks.
Java RMI implemented ODP for cooperating Java programs.
The Web Services Architecture (WSA) is the latest implementation.
WSA is a reference architecture, similar to ODP and OMA. The details can be found at:
http://www.w3.org/2002/ws/arch/
Web Service = service
interface
WSD = web services
descriptor = interface description (supports discovery)
Agent = component =
service implementor (provider agent and/or requestor agent)
The original
motivation envisioned new applications built out of distributed components, but
now the tendency is to make existing applications partially available on the
web by incorporating agents, thus creating a web-enabled application. The
ultimate goal of web-enabling an application is low-cost integration of
existing applications.
SOAP = Simple Object Access Protocol (Messages)
WSDL = Web Services Description Language
UDDI = Universal Description, Discover, and Integration
.NET
J2EE
J2EE Support
Apache
IBM
Eclipse
BEA
Sun
See http://java.sun.com/xml/docs.html
http://www.w3c.org/TR/2003/REC-soap12-part0-20030624/
<?xml version='1.0'
?>
<env:Envelope
xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Header>
<!-- Headers are optional.
Any XML can go in a header.
Headers contain header blocks.
Header blocks describe context information.
-->
</env:Header>
<env:Body>
<!-- any XML goes here -->
</env:Body>
<env:Envelope>
<?xml version='1.0'
?>
<env:Envelope
xmlns:env="http://www.w3.org/2003/05/soap-envelope"
xmlns =
"http://www.demo.org/memo">
<env:Header>
<from>Dr. Morris
Fishbine</from>
<to>HR</to>
<date>2001-11-29</date>
</env:Header>
<env:Body>
<employee>
<name>Joe
Smith</name>
<ssn>555-55-1234</ssn>
<dept>R&D</dept>
</employee>
</env:Body>
</env:Envelope>
Although SOAP
stands for "Simple Object Access Protocol" implying that it's
original application was remote method invocation, SOAP can be used to
represent ordinary messages. In this case the sender agent simply encapsulates
a message in the body of a SOAP envelope and sends it to a receiver agent. The
message itself may be transported using RPC, email, or HTTP. In the simplest
model sessions betwenn a sender and receiver are stateless, but more complicated
patterns can be implemented.
For example:
<?xml version='1.0'
?>
<env:Envelope
xmlns:env="http://www.w3.org/2003/05/soap-envelope"
xmlns =
"http://www.demo.org/memo">
<env:Header>
<from>Dr. Morris
Fishbine</from>
<to>HR</to>
<date>2001-11-29</date>
</env:Header>
<env:Body>
<employee>
<name>Joe
Smith</name>
<ssn>555-55-1234</ssn>
<dept>R&D</dept>
</employee>
</env:Body>
</env:Envelope>
<?xml version='1.0'
?>
<env:Envelope
xmlns:env="http://www.w3.org/2003/05/soap-envelope"
xmlns =
"http://www.demo.org/memo">
<env:Header>
<from>Dr. Morris
Fishbine</from>
<to>HR</to>
<date>2001-11-29</date>
</env:Header>
<env:Body>
<method>dot-product</method>
<vector xc = "3" yc
= "4" zc = "2"/>
<vector xc = "1" yc
= "8" zc = "-7.6"/>
</method>
</env:Body>
</env:Envelope>
<?xml version='1.0' ?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope"
xmlns:rpc='http://www.w3.org/2003/05/soap-rpc'>
<env:Body>
<env:Fault>
<env:Code>
<env:Value>env:Sender</env:Value>
<env:Subcode>
<env:Value>rpc:BadArguments</env:Value>
</env:Subcode>
</env:Code>
<env:Reason>
<env:Text
xml:lang="en-US">Processing error</env:Text>
<env:Text
xml:lang="cs">Chyba zpracování</env:Text>
</env:Reason>
<env:Detail>
<e:myFaultDetails>
<message>vectors must have
same length</message>
<errorcode>999</e:errorcode>
</myFaultDetails>
</env:Detail>
</env:Fault>
</env:Body>
</env:Envelope>
XML can be used
as a language-independent object persistence mechanism:
<object-repository>
<object class = "Person"
OID = "ID505">
<field name = "name"
type = "string">Howard Johnson</field>
<field name = "address"
type = "Address" ref = "ID509"/>
</object>
<object class = "Address"
OID = "ID505">
<field name = "bldg"
type = "int">123</field>
<field name = "street"
type = "string">Sesame St</field>
<field name = "city"
type = "string">NYC</field>
<field name = "state"
type = "string">NY</field>
<field name = "zip"
type = "string">10010</field>
</object>
<!-- etc. -->
</object-repository>
Various schemes
can be found including SOAP, XLINK, and RDF/OWL.
See http://www.xml.org/xml/registry.jsp for DTDs and Schemas for other domain-specific languages.
Of course there are standard XML languages for describing XML languages and documents. These include schemas, style sheets, link bases, and ontologies and will be discussed elsewhere.