3. Defining XML Languages

Document Type Declarations

The prolog of an XML document may contain a document type declaration or DTD. A DTD specifies the syntax of an XML language. The format of a DTD is:

<!DOCTYPE name external-id? [internal-decs]?>

The name of a DTD must match the name of the root element of the document. An internal declaration describes the format of an element, the names and value types of element attributes, an entity, or a notation. An external id specifies the name of a .dtd file where more declarations can be found.

If an XML document includes a DTD, then certain parsers can determine not only that the document is well formed, but also that it is valid-- the syntax of the document conforms to the syntax specified by the DTD.

Element Declarations

An element declaration has the form:

<!ELEMENT name content>

The content can be EMPTY (no content), ANY (content syntax unchecked), mixed (content is a mixture of text and elements), or elements (content consists of other child elements):

content ::= EMPTY | ANY | mixed | elements

In the last case the child elements may form a sequence or a choice. Optional quantifiers allow us to specify if the content is optional, or iterated:

elements ::= ((choice | sequence), quantifier?)

A choice is a sequence of two or more content particles separated by "|" representing "or":

choice ::= (cp (| cp)+)

A sequence is a sequence of one or more content particles separated by "," representing "and":

sequence ::= (cp (, cp)*)

A content particle is a name, choice, or sequence followed by an optional quantifier:

cp ::= ((name | choice | sequence), quantifier?)

The quantifiers are:

? = 0 or 1 occurances
+ = 1 or more occurances
* = 0 or more occurances

Examples

Suppose we want to represent our books, records, CDs and DVDs in a library element. The following declaration requires that all books are listed before all records:

<!ELEMENT library (book*, record*, cd*, dvd*)>

By contrast, the following declaration allows us to list books, records, CDs, and DVDs in any order:

<!ELEMENT library (book | record | cd | dvd)*>

Here are a few more declarations. See if you can figure out examples of elements that conform to these patterns:

<!ELEMENT pet (cat | dog | bird)>
<!ELEMENT person (name, (phone | email)?, address?)
<!ELEMENT list (node, list?)>
<!ELEMENT tree (node, tree?, tree?)>
<!ELEMENT actor (name, agent, performance*)>

Mixed content means the content is a mixture of elements and text. The format of a mixed content element declaration has is:

Mixed ::= (#PCDATA) | (#PCDATA ( | name)* )*

The degenerate case is an element with pure text as content:

<name> Bob Smith </name>

We would declare elements of this type as:

<!ELEMENT name (#PCDATA)>

The more difficult case is when both text and elements are in the content. For example, a memo element might contain to, from, and date elements as well as the text of the note:

<memo>
   Dear <to> Mr. Smith </to>, Please be advised that I have
   left for India.
   Sincerely, <from> Mr. Doright </from>
</memo>

Elements of this type would be declared using:

<!ELEMENT memo (#PCDATA | to | from)*>

Note that the #PCDATA element comes first. We also don't seem to be able to specify that there is only one to element and one from element and that the to element follows the from element.

Attribute List Declarations

An attribute list declaration gives the name of the element followed by a list of attribute definitions:

<!ATTLIST name attdef*>

An attribute definition consists of the name of the attribute followed by the type of the attribute value, followed by the defaults for that attribute:

attdef ::= (name, attType, defaults?)

Attribute Types

There are three attribute types:

attType ::= CDATA | token | enumerated

CDATA is any quoted string.

There seven token types:

token ::=
   ID | IDREF | IDREFS | ENTITY | ENTITIES | NMTOKEN | NMTOKENS

An NMTOKEN (any sequence of letters, digits, or separators-- hyphen, period, dash, colon, or underscore). NMTOKENS are sequences of NMTOKENS.

An ENTITY is an entity reference such as &lt; Entities are entity sequences.

ID and IDREF are names, too. However, a particular ID may only occur once in the document as the value of the corresponding attribute. The value of an IDREF must be the value of an ID that occurs in the document.

An enumeration is a choice of NMTOKEN values. For example:

<!ATTLIST person gender (MALE | FEMALE)>

Default Attribute Values

Attribute defaults allows us to assign a default value to an attribute:

<!ATTLIST person title CDATA "Mr.">

We can fix the default attribute:

<!ATTLIST processor version CDATA #FIXED "1.0">

An implied attribute has no default value:

<!ATTLIST person title CDATA #IMPLIED>

If an attribute has a default value or if it is implied, then its appearance in XML documents is optional. If it is required, then it must be specified:

<!ATTLIST person title CDATA #REQUIRED>

Example

For example, a professor invents a local XML language called CourseML. A CourseML document records the grades for each student in the course. Here's a sample document:

<course>
   <title> XML Programming </title>
   <semester> Spring 2004 </semester>
   <exams>
      <exam exam-id = "midterm1" date = "2004-02-12" total = "100"/>
      <exam exam-id = "midterm2" date = "2004-03-15" total = "100"/>
      <exam exam-id = "final" date = "2004-05-20" total = "200"/>
   </exams>
   <students>
      <student name = "Joe Smith" student-id = "A123" grade = "A">
         <score exam-ref = "midterm1" total = "90"/>
         <score exam-ref = "midterm2" total = "95"/>
         <score exam-ref = "final" total = "182"/>
      </student>
      <student name = "Mary Jones" student-id = "A124" grade = "B">
         <score exam-ref = "midterm1" total = "90"/>
         <score exam-ref = "midterm2" total = "78"/>
         <score exam-ref = "final" total = "155"/>
      </student>
      <student name = "Stu Glop" student-id = "A125" grade = "C">
         <score exam-ref = "midterm1" total = "70"/>
         <score exam-ref = "midterm2" total = "70"/>
         <score exam-ref = "final" total = "140"/>
      </student>
      <!-- etc. -->
   </students>
</course>

The professor creates a file called course.dtd where he places a DTD for CourseML:

<!ELEMENT course (title, semester, exams, students)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT semester (#PCDATA)>
<!ELEMENT exams (exam+)>
<!ELEMENT students (student+)>
<!ELEMENT student (score+)>
<!ELEMENT exam EMPTY>
<!ELEMENT score EMPTY>

<!ATTLIST exam
   exam-id ID #REQUIRED
   date CDATA #IMPLIED
   total CDATA "100">

<!ATTLIST student
   name CDATA #REQUIRED
   student-id ID #REQUIRED
   grade (A | B | C | D | F) "F">

<!ATTLIST
   score exam-ref IDREF #REQUIRED
   total CDATA #REQUIRED>

The professor also adds a DTD declaration to the prolog of every CorseML document. There are two ways to do this, either the DTD can be declared externally:

<?xml version = "1.0" standalone = "no"?>
<!DOCTYPE course SYSTEM "course.dtd">

Or the DTD can be declared internally:

<?xml version = "1.0" standalone = "yes"?>
<!DOCTYPE course [
   <!ELEMENT course (title, semester, exams, students)>
   <!ELEMENT title (#PCDATA)>
   <!ELEMENT semester (#PCDATA)>
   <!ELEMENT exams (exam+)>
   <!ELEMENT students (student+)>
   <!ELEMENT student (score+)>
   <!ELEMENT exam EMPTY>
   <!ELEMENT score EMPTY>
   <!ATTLIST exam
      exam-id ID #REQUIRED
      date CDATA #IMPLIED
      total CDATA "100">
   <!ATTLIST student
      name CDATA #REQUIRED
      student-id ID #REQUIRED
      grade (A | B | C | D | F) "F">
   <!ATTLIST
      score exam-ref IDREF #REQUIRED
      total CDATA #REQUIRED>
]>

Entity Declarations

Entities can also be declared in DTDs. This is useful for creating abbreviations of often used strings. For example:

<?xml version = "1.0" standalone = "no"?>

<!DOCTYPE course SYSTEM "course.dtd"
   [<!ENTITY mt1 "midterm1">
    <!ENTITY mt2 "midterm2">
]
>

<course>
   <title> XML Programming </title>
   <semester> Spring 2004 </semester>
   <exams>
      <exam exam-id = "&mt1;" date = "2004-02-12" total = "100"/>
      <exam exam-id = "&mt2;" date = "2004-03-15" total = "100"/>
      <exam exam-id = "final" date = "2004-05-20" total = "200"/>
   </exams>
   <students>
      <student name = "Joe Smith" student-id = "A123" grade = "A">
         <score exam-ref = "&mt1;" total = "90"/>
         <score exam-ref = "&mt2;" total = "95"/>
         <score exam-ref = "final" total = "182"/>
      </student>
      <student name = "Mary Jones" student-id = "A124" grade = "B">
         <score exam-ref = "&mt1;" total = "90"/>
         <score exam-ref = "&mt2;" total = "78"/>
         <score exam-ref = "final" total = "155"/>
      </student>
      <student name = "Stu Glop" student-id = "A125" grade = "C">
         <score exam-ref = "&mt1;" total = "70"/>
         <score exam-ref = "&mt2;" total = "70"/>
         <score exam-ref = "final" total = "140"/>
      </student>
      <!-- etc. -->
   </students>
</course>

Schemas

Although DTD stands for Document Type Definition, what's contained in a DTD is simply a grammar describing a language rather than a type declaration. For example, we might declare a test score element as follows:

<!ELEMENT testScore (#PCDATA)>

Although the intention is that the content of a test score element be numerical, there is nothing wrong with the following valid element:

<testScore> Hello World </testScore>

Using a schema to specify an XML vocabulary gets us much closer to a true type declaration. Unlike DTDs, schemas are XML documents. Thus, the tools we use for manipulating XML documents can also be used to manipulate XML schemas.

A file containing an XML schema usually has an xsd extension. Here's the basic structure of such a file:

<?xml version = "1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
   <!-- global declarations go here -->
</xsd:schema>

Global declarations declare the elements that appear in documents that instantiate the schema:

<xsd:element name = "tag1" type = "Tag1Type"/>
<xsd:element name = "tag2" type = "Tag2Type"/>
<xsd:element name = "tag3" type = "Tag3Type"/>

An element declaration usually specifies the name and type of the element. The type may be a built-in type such as:

xsd:string ::= <char>+
xsd:boolean ::= false | true
xsd:decimal ::= (+ | -)?<digit>+(.<digit>*)?
xsd:double ::= IEEE double precision float
xsd:float ::= IEEE single precision float
xsd:integer ::= (+|-)?<digit>+
xsd:duration ::= P<int>Y<int>M<int>DT<int>H<int>M<int>S
xsd:time ::= <hours>:<mins>:<secs>
   <hours>, <mins> ::= <int>, <secs> ::= <decimal>
xsd:date ::= <CCYY>-<MM>-<DD>
xsd:anyURI ::= <URI>
xsd:ID ::= <NCName>
xsd:IDREF ::= <NCName>
xsd:QName ::= <URI>:<NCName>
xsd:Name ::= <XMLName>

Or it can be a complex type declared by the user. In this case the type schema will also contain global type declarations:

<xsd:complexType name = "Tag1Type">
   <!-- tag1 elements and attributes declared here -->
</xsd:complexType>
<xsd:complexType name = "Tag2Type">
   <!-- tag2 elements and attributes declared here -->
</xsd:complexType>
<xsd:complexType name = "Tag3Type">
   <!-- tag3 elements and attributes declared here -->
</xsd:complexType>

Example: Memos

A memo consists of five parts:

date: 2003-07-04
from: Ms. Diane Smith
to: Mr. Bill Jones, Dr. Edna Sach
re: your performance
text: Both of you are doing an excellent job.

We can represent this as an XML document:

<memo>
   <date> 2003-07-04 </date>
   <from title = "Ms">
         <name> Diane Smith </name>
   </from>
   <to>
         <name> Bill Jones </name>
   </to>
   <to title = "Dr">
         <name> Edna Sach </name>
   </to>
   <re> Your performance </re>
   <text> Both of you are doing an excellent job. </text>
</memo>

The schema for memos is in memo.xsd. Here's the format:

<?xml version = "1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  
   <xsd:element name = "memo" type = "MemoType"/>
  
   <xsd:complexType name = "MemoType">
      <!-- Memo elements go here -->
   </xsd:complexType>
  
   <xsd:complexType name = "PersonType">
      <!-- person elements and attributes go here -->
   </xsd:complexType>
  
</xsd:schema>

Our schema declares a single element called "memo" of type MemoType. It also declares two types: MemoType and PersonType.

An instance of MemoType contains a sequence of at least five elements: a "date" element, a "from" element of type PersonType, one or more "to" elements of type PersonType, a "regarding" element of type string, and a "text" element of type string:

<xsd:complexType name = "MemoType">
   <xsd:sequence>
      <xsd:element name = "date" type = "xsd:date"/>
      <xsd:element name = "from" type = "PersonType"/>
      <xsd:element name = "to"
         maxOccurs = "unbounded" type = "PersonType"/>
      <xsd:element name = "re" type = "xsd:string"/>
      <xsd:element name = "text" type = "xsd:string"/>
   </xsd:sequence>
</xsd:complexType>

An instance of PersonType contains a single name element of type string and an optional title attribute with a (sexist) default value of "Mr":

<xsd:complexType name = "PersonType">
   <xsd:sequence>
      <xsd:element name = "name" type = "xsd:string"/>
   </xsd:sequence>
   <xsd:attribute name = "title" use = "optional" default = "Mr"/>
</xsd:complexType>

Occurrence Constraints for Elements and Attributes

An element declaration may specify the minimum and maximum number of occurrences. The default value for both is 1. The maximum number of occurrences may be set to "unbounded" to indicate one or more. A default value (content) may also be specified:

<xsd:element name = "tag" type = "TYPE"
   minOccurs = "N"
   maxOccurs = "M"
   default = "J"/>

where:

0 <= N <= M <= "unbounded"

For example, assume the following declaration is made:

<xsd:element name = "nums">
   <xsd:complexType>
      <xsd:sequence>
         <xsd:element name = "num" type = "xsd:integer"
            minOccurs = "0" maxOccurs = "3" default = "42"/>
      </xsd:sequence>
   <xsd:complexType>
</xsd:element>

Notice that the type of an element declaration can be declared globally or locally. In the example above we declare the type on nums by an anonymous local declaration. Alternatively, we could have used a global declaration:

<xsd:element name = "nums" type = "NumsType"/>
<xsd:complexType name = "NumsType">
   <xsd:sequence>
      <xsd:element name = "num" type = "xsd:integer"
         minOccurs = "0" maxOccurs = "3" default = "42"/>
   </xsd:sequence>
<xsd:complexType>

Here are a few sample nums elements:

<nums/>

<nums>
   <num> 93 </num>
   <num> 18 </num>
</nums>

<nums>
   <num> 93 </num>
   <num/>
   <num/>
</nums>

The last element is equivalent to:

<nums>
   <num> 93 </num>
   <num> 42 </num>
   <num> 42 </num>
</nums>

Note, the following declaration is inconsistent:

<xsd:element name = "num" type = "xsd:integer"
   minOccurs = "2" default = "42"/>

Instead of a default value, we can specify that a fixed value should be used:

<xsd:element name = "num" type = "xsd:integer"
   minOccurs = "0" maxOccurs = "3" fixed = "42"/>

The element:

<nums>
   <num> 42 </num>
   <num/>
   <num/>
</nums>

is equivalent to:

<nums>
   <num> 42 </num>
   <num> 42 </num>
   <num> 42 </num>
</nums>

while the element:

<nums>
   <num> 93 </num>
   <num> 42 </num>
   <num> 42 </num>
</nums>

is illegal. The only allowable value for num is 42.

We can use the "use" attribute to specify if an attribute is optional, required, or prohibited. We can also use the default attribute to set a default value. For example:

<xsd:element name = "person" type = "xsd:string">
   <xsd:attribute name = "age" use = "optional" default = "21"/>
</xsd:element>

Here are a few sample elements:

<person age = "35"> Bob Dobbs </person>
<person> Betty Boop </person>

The last element is equivalent to:

<person age = "21"> Betty Boop </person>

Let's change the declaration to:

<xsd:element name = "person" type = "xsd:string">
   <xsd:attribute name = "age" use = "optional" fixed = "21"/>
</xsd:element>

Now the element:

<person age = "35"> Bob Dobbs </person>

is illegal, because it specifies an age attribute different from 21. If we change optional to required:

<xsd:element name = "person" type = "xsd:string">
   <xsd:attribute name = "age" use = "required" fixed = "21"/>
</xsd:element>

Now both of the following elements are illegal:

<person age = "35"> Bob Dobbs </person>
<person> Betty Boop </person>

Creating Complex Types from Simple Types

It's easy to declare an element with text content such as:

<person> Bob Jones </person>

This is just:

<xsd:element name = "person" type = "xsd:string"/>

But what happens if the element also has an attribute:

<person age = "21"> Bob Jones </person>

Unfortunately, we'll need to declare a complex type:

<xsd:element name = "person" type = "PersonType"/>

The PersonType uses a simpleContent element that extends the simple type string by adding an attribute:

<xsd:complexType name = "PersonType">
   <xsd:simpleContent>
      <xsd:extension base = "xsd:string">
         <xsd:attribute name = "age" type = "xsd:positiveInteger"/>
      </xsd:extension>
   </xsd:simpleContent>
</xsd:complexType>

Mixed Content

A mixed content element contains a mixture of text and elements:

<letterBody>
<salutation>Dear Mr.<name>Robert Smith</name>.</salutation>
Your order of <quantity>1</quantity> <productName>Baby
Monitor</productName> shipped from our warehouse on
<shipDate>1999-05-21</shipDate>. ....
</letterBody>

We declare a mixed complex type by setting the mixed attribute to true:

<xsd:element name="letterBody">
 <xsd:complexType mixed="true">
  <xsd:sequence>
   <xsd:element name="salutation">
    <xsd:complexType mixed="true">
     <xsd:sequence>
      <xsd:element name="name" type="xsd:string"/>
     </xsd:sequence>
    </xsd:complexType>
   </xsd:element>
   <xsd:element name="quantity"    type="xsd:positiveInteger"/>
   <xsd:element name="productName" type="xsd:string"/>
   <xsd:element name="shipDate"    type="xsd:date" minOccurs="0"/>
   <!-- etc. -->
  </xsd:sequence>
 </xsd:complexType>
</xsd:element>

Empty Content

<internationalPrice currency="EUR" value="423.46"/>

 

<xsd:element name="internationalPrice">
 <xsd:complexType>
  <xsd:complexContent>
   <xsd:restriction base="xsd:anyType">
    <xsd:attribute name="currency" type="xsd:string"/>
    <xsd:attribute name="value"    type="xsd:decimal"/>
   </xsd:restriction>
  </xsd:complexContent>
 </xsd:complexType>
</xsd:element>

Short hand notation:

<xsd:element name = "internationalPrice" type = "PriceType"/>

<xsd:complexType name = "PriceType">
   <xsd:attribute name="currency" type="xsd:string"/>
   <xsd:attribute name="value"    type="xsd:decimal"/>
</xsd:complexType>


Example: Polynomials

A polynomial consists of a sequence of monomials. A monomial consists of a floating point coefficient, variable, and an integer exponent. If the variable isn't specified, then the exponent is understood to be 0. If the exponent isn't specified, then its value is understood to be 1. For example, to represent the polynomial:

3.1x2 + 5x - 9

we would write:

<polynomial>
   <monomial>
      <coeff> 3.1 </coeff>
      <variable exp = "2"> x </variable>
   </monomial>
   <monomial>
      <coeff> 5.0 </coeff>
      <variable> x </variable>
   </monomial>
   <monomial>
      <coeff> -9.0 </coeff>
   </monomial>
</polynomial>

The schema for polynomials will be in a file called poly.xsd. Here's the layout of this file:

<?xml version = "1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

   <xsd:element name = "polynomial" type = "PolyType"/>

   <xsd:complexType name = "PolyType">
      <xsd:sequence>
         <xsd:element name = "monomial" 
            maxOccurs = "unbounded" type = "MonoType"/>
      </xsd:sequence>
   </xsd:complexType>
  
   <xsd:complexType name = "MonoType">
      <!-- coeff & variable declarations go here -->
   </xsd:complexType>

   <xsd:complexType name = "VarType">
      <!-- exponent attribute declared here -->
   </xsd:complexType>
  
</xsd:schema>

Our schema consists of three global declarations: the declaration of an element called "polynomial" of type PolyType; the declaration of PolyType, which is a sequence of one or more elements called "monomials" of type MonoType; and the declaration of MonoType.
An instance of MonoType consists of two elements, a coefficient of type double, followed by 0 or 1 variable elements:

   <xsd:complexType name = "MonoType">
      <xsd:sequence>
         <xsd:element name = "coeff" type = "xsd:double"/>
         <xsd:element name = "variable"
            minOccurs = "0" type = "VarType"/>
      </xsd:sequence>
   </xsd:complexType>

A VarType is a complex type that extends the simple type string by adding an optional "exp" attribute with default value 1:

<xsd:complexType name = "VarType">
   <xsd:simpleContent>
      <xsd:extension base = "xsd:string">
         <xsd:attribute name = "exp" use = "optional"
            default = "1" type = "xsd:positiveInteger"/>
      </xsd:extension>
   </xsd:simpleContent>
</xsd:complexType>

From this example we can see that both elements and attributes may be declared optional and may have default values. However, there are a few noteworthy differences.

Derived Types

The XML Schema types can be simple or complex:

<type> ::= <simple> | <complex>

A simple type can be primitive or derived:

<simple> ::= <primitive> | <derived>

Examples of primitive types include:

<primitive> ::=
   double | float | decimal | boolean | time | date | string | etc

New types can be derived from existing types by restriction, union, or list.

Restriction by Facets

<xsd:simpleType name = "ScoreType">
   <xsd:restriction base = "xsd:integer">
      <xsd:minInclusive value = "0"/>
      <xsd:maxInclusive value = "100"/>
   </xsd:restriction>
</xsd:simpleType>

<xsd:simpleType name = "IdentifierType">
   <xsd:restriction base = "xsd:string">
      <xsd:pattern value = "[a-z]{6}"/>
   </xsd:restriction>
</xsd:simpleType>

Restriction by Enumeration

<xsd:simpleType name = "WeekDayType">
   <xsd:restriction base = "xsd:string">
      <xsd:enumeration value = "Monday"/>
      <xsd:enumeration value = "Tuesday"/>
      <xsd:enumeration value = "Wednesday"/>
      <xsd:enumeration value = "Thursday"/>
      <xsd:enumeration value = "Friday"/>
   </xsd:restriction>
</xsd:simpleType>

List Types

<xsd:simpleType name = "TestScoresType">
   <xsd:list itemType = "ScoreType"/>
</xsd:simpleType>

<xsd:element name = "testScores" type = "TestScoresType"/>

<testScores>
   <score> 93 </score>
   <score> 85 </score>
   <score> 63 </score>
   <score> 59 </score>
</testScores>

Union Types

<xsd:simpleType name = "SSN-Type">
   <xsd:restriction base = "xsd:string">
      <xsd:pattern value = "d{3}-d{2}-d{4}"/>
   </xsd:restriction>
</xsd:simpleType>

<xsd:simpleType name = "NameOrSSN-Type">
   <xsd:union memberTypes = "xsd:string SSN-Type"/>
</xsd:simpleType>

Example: People

<?xml version = "1.0" ?>
<people>
   <person>
      <name> Bart Simpson </name>
      <gender>male</gender>
      <bday>
         <month> 9 </month>
         <day> 15 </day>
         <year> 1992 </year>
      </bday>
   </person>
   <person>
      <name> Homer Simpson </name>
      <gender>male</gender>
      <bday>
         <month> 3 </month>
         <day> 25 </day>
         <year> 1951 </year>
      </bday>
   </person>
   <person>
      <name> Marge Simpson </name>
      <gender>female</gender>
      <bday>
         <month> 7 </month>
         <day> 20 </day>
         <year> 1952 </year>
      </bday>
   </person>
   <person>
      <name> Lisa Simpson </name>
      <gender>female</gender>
      <bday>
         <month> 5 </month>
         <day> 5 </day>
         <year> 1995 </year>
      </bday>
   </person>
   <person>
      <name> Maggie Simpson </name>
      <gender>female</gender>
      <bday>
         <month> 4 </month>
         <day> 20 </day>
         <year> 1998 </year>
      </bday>
   </person>
</people>

<?xml version = "1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

  
<xsd:element name = "people">
     
<xsd:complexType>
        
<xsd:sequence>
           
<xsd:element  maxOccurs = "unbounded" ref = "person"/>
        
</xsd:sequence>
     
</xsd:complexType>
  
</xsd:element>

  
<xsd:element name = "person" type = "PersonType"/>


  
<xsd:complexType name="PersonType">
     
<xsd:sequence>
        
<xsd:element name = "name" type="xsd:string"/>
        
<xsd:element name = "gender" type="GenderType"/>
        
<xsd:element name = "bday" type = "DateType"/>
     
</xsd:sequence>
  
</xsd:complexType>
  
  
<xsd:simpleType name="GenderType">
     
<xsd:restriction base="xsd:string">
        
<xsd:enumeration value="male"/>
        
<xsd:enumeration value="female"/>
     
</xsd:restriction>
  
</xsd:simpleType>


  
  
<xsd:complexType name="DateType">
     
<xsd:sequence>
        
<xsd:element name = "month" type = "MonthType"/>
        
<xsd:element name = "day" type = "DayType"/>
        
<xsd:element name = "year" type = "YearType"/>
     
</xsd:sequence>
  
</xsd:complexType>
  
  
<xsd:simpleType name = "DayType">
     
<xsd:restriction base = "xsd:integer">
        
<xsd:minInclusive value = "1"/>
        
<xsd:maxInclusive value = "31"/>
     
</xsd:restriction>
  
</xsd:simpleType>
  
  
<xsd:simpleType name = "MonthType">
     
<xsd:restriction base = "xsd:integer">
        
<xsd:minInclusive value = "1"/>
        
<xsd:maxInclusive value = "12"/>
     
</xsd:restriction>
  
</xsd:simpleType>
  
  
<xsd:simpleType name = "YearType">
     
<xsd:restriction base = "xsd:integer">
        
<xsd:minInclusive value = "1900"/>
        
<xsd:maxInclusive value = "3000"/>
     
</xsd:restriction>
  
</xsd:simpleType>



</xsd:schema>


Annotations

<xsd:annotation>
   <xsd:documentation xml:lang = "en">
      (c) 2003, all rights reserved
   </xsd:documentation>
</xsd:annotation>

<xsd:annotation>
   <xsd:appInfo>
      Process with JavaScript
   </xsd:appInfo>
</xsd:annotation>

Parsing and Validating XML in Java

C:\web\xml\syntax>java Validator
usage:
   java Validator xmlFile or
   java Validator xmlFile -dtd or
   java Validator xmlFile -xsd or
   java Validator xmlFile xsdFile

C:\web\xml\syntax>java Validator poly1.xml
poly1.xml is well formed

C:\web\xml\syntax>java Validator poly1.xml -xsd
poly1.xml is well formed
poly1.xml is also valid

C:\web\xml\syntax>java Validator poly1.xml poly.xsd
poly1.xml is well formed
poly1.xml is also valid

Validator.java

// import some JAXP packages:
import javax.xml.parsers.*;
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import org.w3c.dom.*;
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
import javax.xml.transform.dom.*;
import org.w3c.dom.traversal.*;
import java.io.*;

class XMLErrorHandler implements ErrorHandler { ... }
public class Validator { ... }

Validator

public class Validator {

   static final String JAXP_SCHEMA_LANGUAGE =
      "http://java.sun.com/xml/jaxp/properties/schemaLanguage";
   static final String W3C_XML_SCHEMA =
      "http://www.w3.org/2001/XMLSchema";
   static final String JAXP_SCHEMA_SOURCE =
      "http://java.sun.com/xml/jaxp/properties/schemaSource";
   static boolean xsdValidating = false;
   static boolean dtdValidating = false;
   static String xsdFile = "";

   public static Document parse(String xmlFile)
   public static void usage() { ... }
   public static void main(String[] args) { ... }
}

Validator.parse()

public static Document parse(String xmlFile) throws Exception {
   Document doc = null; // result tree
   // obtain the default parser factory:
   DocumentBuilderFactory parserFactory =
      DocumentBuilderFactory.newInstance();
   // make validating, namespace aware parsers:
   parserFactory.setValidating(dtdValidating || xsdValidating);
   parserFactory.setNamespaceAware(true);
   // setup for using schemas:
   if (xsdValidating) {
      parserFactory.setAttribute(
         JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
   }
   if (xsdValidating && xsdFile != null) {
      parserFactory.setAttribute(
         JAXP_SCHEMA_SOURCE, new File(xsdFile));
   }
   // create a parser:
   DocumentBuilder parser =
      parserFactory.newDocumentBuilder();
   parser.setErrorHandler(new XMLErrorHandler());
   // parse an XML file:
   doc = parser.parse(new File(xmlFile));
   // collapse consecutive white spaces:
   doc.normalize();
   return doc;
}

Validator.usage()

public static void usage() {
   System.out.println("usage:");
   System.out.println(" java Validator xmlFile or");
   System.out.println(" java Validator xmlFile -dtd or");
   System.out.println(" java Validator xmlFile -xsd or");
   System.out.println(" java Validator xmlFile xsdFile");
}

Validator.main()

public static void main(String[] args) {
   try {
      if (args.length == 0 || args.length > 2) {
         usage();
         return;
      }

      String xmlFile = args[0];
      if (args.length == 2) {
         if (args[1].equals("-dtd")) {
            dtdValidating = true;
            xsdValidating = false;
            xsdFile = null;
         } else if (args[1].equals("-xsd")) {
            dtdValidating = false;
            xsdValidating = true;
            xsdFile = null;
         } else {
            xsdValidating = true;
            xsdFile = args[1];
         }
      }

      Document doc = parse(args[0]);
      System.out.println(args[0] + " is well formed");
      if (xsdValidating || dtdValidating)
         System.out.println(args[0] + " is also valid");
   } catch (Exception e) {
      System.err.println(e.getMessage());
   }
}

XMLErrorHandler

class XMLErrorHandler implements ErrorHandler {
   public void error(SAXParseException exception)
   throws SAXParseException {
      System.err.println("===> error");
      throw exception;
   }
   public void fatalError(SAXParseException exception)
   throws SAXParseException  {
      System.err.println("===> fatal error");
      throw exception;
   }
   public void warning(SAXParseException exception)
   throws SAXParseException  {
      System.err.println("===> warning");
      throw exception;
   }
}

Programming Notes

XSD Built-in Types

Diagram of built-in type hierarchy

A complex type can bear attributes and elemental content.

For example:

<xsd:element name = "person" type = "PersonType"/>
<xsd:element name = "gender" type = "xsd:string"/>
<xsd:element name = "comment" type = "xsd:string"/>

A declaration for a complex type has the form:

<xsd:complexType name = "PersonType">
   <!-- element, reference, & attribute declarations go here -->
</xsd:complexType>

Programming Notes

Complex Type Declarations

<xsd:complexType name = "XXXX">
   <xsd:simpleContent> ...
   <xsd:complexContent> ...
   <xsd:group> ...
   <xsd:all> ...
   <xsd:choice> ...
   <xsd:sequence> ...