11. XSL Formatting Objects

The Formatting Process

An XSL-FO document is an XML document with a .fo or .fob extension that combines content with markup that formats that content.

When a formatter reads doc.fob, it instantiates a page based on a page master. It then fills the page with content from the page sequence. When this page is filled, the process repeats. The process terminates when there is no more content.

while (page-sequence has content) {
   1. create a new page by instantiating a master
   2. fill the page with content from the page-sequence
}

Formatting Processors

These are the currently available XSL-FO processors. FOP and PassiveTeX are the free processors, but the commercial products generally produce better output.

FOP

FOP is a Java-based processor available free from the Apache XML Project. FOP can produce usable output, but it is still under development and has some limitations that prevent it from outputting production quality typesetting. It eventually will be able to produce many forms of output beyond the current PDF.

PassiveTeX

PassiveTeX from Sebastian Rahtz (http://www.tei-c.org.uk/Software/passivetex/) is a free XSL-FO processor based on TeX. It has many of the typesetting strengths of the highly respected TeX typesetting language. But TeX is also big and complicated and can produce a bewildering blizzard of messages as it processes a file. Fortunately it has gotten easier to set up now. It produces useful output, but is also still under development.

XEP

XEP (written in Java) is a commercial product from RenderX ( http://www.renderx.com/).

XSL Formatter

A commercial product from Antenna House http://www.antennahouse.com/.

Unicorn Formatting Objects

A commercial product from Unicorn Enterprises SA http://www.unicorn-enterprises.com/. For Windows only.

xmlroff

xmlroff (http://xmlroff.sourceforge.net/) is written in C, and uses libxml2 and other GNOME libraries. It is an open source project but is not yet a complete implementation.

Other XSL-FO processors are listed on the W3C's XSL information page.

Document Structure

The FO Root

The FO root element contains a set of page masters followed by one or more page sequences:

<?xml version="1.0" encoding="iso-8859-1"?>

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">

   <fo:layout-master-set>
      <!-- page masters go here -->
   </fo:layout-master-set>

   <fo:page-sequence>
      <!-- document content goes here -->
   </fo:page-sequence>

   <!-- etc. -->

</fo:root>

Page Masters

There is currently only one type of master page, which is a simple (i.e., rectangular) master page. The attributes of this element specify its page size, margin sizes, and name:

<fo:simple-page-master 
   master-name="US-Letter"
   page-height="11in"  
   page-width="8.5in"
   margin-top="0.5in"  
   margin-bottom="0.5in"
   margin-left="0.5in" 
   margin-right="0.5in">
      <!-- page regions go here -->
</fo:simple-page-master>

For example, we might define page masters for each of the different types of pages found in a book:

<fo:layout-master-set>
   <fo:simple-page-master  master-name="tocPage" ...>
      ...
   </fo:simple-page-master>
   <fo:simple-page-master  master-name="firstPage" ...>
      ...
   </fo:simple-page-master>
   <fo:simple-page-master  master-name="leftPage" ...>
      ...
   </fo:simple-page-master>
   <fo:simple-page-master  master-name="rightPage" ...>
      ...
   </fo:simple-page-master>
   <fo:simple-page-master  master-name="indexPage" ...>
      ...
   </fo:simple-page-master>
</fo:layout-master-set>

Page Regions

A page master specifies up to five regions of a page:

We can specify the widths of the before and after regions, and the heights of the start and end regions. To avoid overlap with the body region, we commonly set them inside the margins, hence outside of the body:

Here is a sample page master with all five regions declared:

<fo:simple-page-master  master-name="typical"
      page-width="8.5in"   page-height="11in"
      margin-top="0.5in"   margin-bottom="0.5in"
      margin-left="0.5in"  margin-right="0.5in">
   <fo:region-body   margin="1.0in"/>
   <fo:region-before extent="1.0in"/>
   <fo:region-after  extent="1.0in"/>
   <fo:region-start  extent="1.0in"/>
   <fo:region-end    extent="1.0in"/>
</fo:simple-page-master>

Page Sequences

After the master set comes one or more page sequences:

<fo:page-sequence master-reference="MASTER-NAME">
   <!-- title element (optional) -->
   <!-- static text elements (optional) -->
   <!-- flow -->
</fo:page-sequence>
     

Title

Titles (like the titles of HTML pages) are optional and don't seem to do much:

<fo:title> Title </fo:title>

Static Text

A static text element appears on each page, such as a header and footer:

<fo:static-content flow-name="xsl-region-before">
   <!-- Header block goes here -->
</fo:static-content>
<fo:static-content flow-name="xsl-region-after">
   <!-- Footer block goes here -->
</fo:static-content>

Flows

A flow contains all of the non-repeating content of the document:

<fo:flow flow-name = "xsl-region-body">
   <!-- content blocks go here -->
</fo:flow>

Block Elements

Paragraphs, lists, and tables are examples of block sructures:

<fo:block>
   <!-- content & inline elements go here -->
</fo:block>

Inline Elements

Blocks are automatically divided into lines. An inline element allows us to format part of a line:

<fo:inline font-style="italic" text-align="start">
   <!-- content goes here -->
</fo:inline>

Example

In this example we create test.fob, an XSL-FO document that contains the content and formatting of a simple document. A single page master called "typical" specifies all five non-overlapping areas. We use a formatter to generate a pdf file called test.pdf from test.fob:

test.fob

<?xml version="1.0" encoding="iso-8859-1"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
   <fo:layout-master-set>
      <fo:simple-page-master  master-name="typical"
            page-width="8.5in" page-height="11in"
            margin-top="0.5in" margin-bottom="0.5in"
            margin-left="0.5in"  margin-right="0.5in">
         <fo:region-body   margin="1.0in"/>
         <fo:region-before extent="1.0in"/>
         <fo:region-after  extent="1.0in"/>
         <fo:region-start  extent="1.0in"/>
         <fo:region-end    extent="1.0in"/>
      </fo:simple-page-master>
   </fo:layout-master-set>

   <fo:page-sequence master-reference="typical">
      <fo:static-content flow-name="xsl-region-start">
         <fo:block> Left margin note </fo:block>
      </fo:static-content>

      <fo:static-content flow-name="xsl-region-before">
         <fo:block> Header </fo:block>
      </fo:static-content>

      <fo:static-content flow-name="xsl-region-after">
         <fo:block> Footer </fo:block>
      </fo:static-content>

      <fo:static-content flow-name="xsl-region-end">
         <fo:block> Right margin note </fo:block>
      </fo:static-content>

      <fo:flow flow-name="xsl-region-body">
         <fo:block> Body Text </fo:block>
      </fo:flow>

   </fo:page-sequence>
</fo:root>

Formatting with XEP 3.5

C:\pearce\web\xml\xslfo>xep -fo test.fob
(document (validate [validation OK])
[system-id file:/C%3A/pearce/web/xml/xslfo/test.fob]
(compile (masters (sequence-master [master-name typical]))
(sequence [master-reference typical](static-content [flow-name xsl-region-start]
)(static-content [flow-name xsl-region-before])(static-content [flow-name xsl-region-end])(static-content [flow-name xsl-region-after])(flow [flow-name xsl-region-body])))
(format
(sequence [master-reference typical](flow [page-number 1])
(static-content [page-number 1][region-name xsl-region-start][region-name xsl-region-after][region-name xsl-region-end][region-name xsl-region-before])
))
(generate [output-format pdf][page-number 1]))

Page Sequence Masters

In many situations a multi-page document will use different page masters for different pages. For example, we might define different masters for first pages, even pages, and odd pages. This is done by add page sequence masters to our page master element. For example, the following master set defines two page masters called "first" and "subsequent". It then adds a page sequence master called "contents" which is used by the following page sequence:

<fo:layout-master-set>
   <fo:simple-page-master master-name="first" ...>
      ...
   </fo:simple-page-master>
   <fo:simple-page-master master-name="subsequent" ... >
      ...
   </fo:simple-page-master>
   <fo:page-sequence-master master-name="contents">
      <fo:repeatable-page-master-reference
         master-reference="first"
         maximum-repeats="1"/>
      <fo:repeatable-page-master-reference
         master-reference="subsequent"/>
   </fo:page-sequence-master>
</fo:layout-master-set>

<fo:page-sequence master-reference="contents">
   ...
</fo:page-sequence>

The page sequence master says:

use the "first" page master to format the first n pages, where n = 1. Use the "subsequent" master to format any remaining pages.

Example

So far our XSL-FO documents have contained both, content and formatting. More typically, the content would be contained in a distinct XML document. In this case we create an XSLT document that outputs a new XML document containing both content and formatting. In our example, an XML document called chapters.xml contains our content: the chapters of a book we are writing. Another file called book.xsl will combine the content of chapters.xml with the desired formatting to produce book.fob:

xalan chapters.xml book.xsl book.fob

Finally, we use a formatter to transform book.fob into book.pdf:

xep -fo book.fob

An XML Vocabulary for Books

A schema for an XML document containing an entire book might look something like this:

<book>

   <front-matter>
      <title> ... </title>
      <introduction> ... </introduction>
      <contents> ... </contents>
   </front-matter>

   <chapters>
      <chapter number = "1" title = "Chapter 1">
         <!-- paragraphs, figures, tables,
              footnotes, etc. go here -->
      </chapter>
      <!-- more chapters -->
   </chapters>

   <end-matter>
      <end-notes> ... </end-notes>
      <references>
         <reference>
            <title> ... </title>
            <author> ... </author>
            <publisher> ... </publisher>
            <date> ... </date>
         </reference>
         <!-- more references -->
      </references>
      <index> ...</index>
   </end-matter>

</book>

chapters.xml

We create a document called chapters.xml, that instantiates only the chapters element of our book schema:

<chapters>
   <chapter number = "1" title = "Formatting Elements">
      <para> ... </para>
      <!-- many more paragraphs -->
   </chapter>
   <chapter number = "2" title = "Formatting Properties">
      <para> ... </para>
      <!-- many more paragraphs -->
   </chapter>
</chapters>

book.xsl

The main template of our style sheet program declares an fo:root element containing a set of page masters and page sequence masters, as well as a page sequence for each chapter in the source document. A second template will be used to convert individual paragraphs within a chapter into fo:block elements:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:fo="http://www.w3.org/1999/XSL/Format">

<xsl:template match="/">
   <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
      <fo:layout-master-set>
         <!-- page masters go here -->
      </fo:master-set>
      <xsl:for-each select = "//chapter">
         <!-- page sequence goes here -->
      </xsl:for-each>
   </fo:root>
</xsl:template>

<!-- output content of a paragraph: -->
<xsl:template match="para">
    <fo:block>
      <xsl:value-of select="."/>
   </fo:block>
</xsl:template>

</xsl:stylesheet>

Master Set

The master set is hard-wired into our style sheet:

<fo:layout-master-set>
   <!-- first page of chapter -->
   <fo:simple-page-master  master-name="first"
         page-width="8.5in"   page-height="11in"
         margin-top="3in"  margin-bottom="0.5in"
         margin-left="0.5in"  margin-right="0.5in">
      <fo:region-body   margin="1.0in"/>
      <fo:region-before extent="3.0in"/>
      <fo:region-after  extent="0.5in"/>
      <fo:region-start  extent="1.0in"/>
      <fo:region-end extent="1.0in"/>
   </fo:simple-page-master>
   <!-- subsequent chapter pages -->
   <fo:simple-page-master  master-name="subsequent"
         page-width="8.5in"   page-height="11in"
         margin-top="0.5in"   margin-bottom="0.5in"
         margin-left="0.5in"  margin-right="0.5in">
      <fo:region-body   margin="1.0in"/>
      <fo:region-before extent="0.5in"/>
      <fo:region-after  extent="0.5in"/>
      <fo:region-start  extent="1.0in"/>
            <fo:region-end extent="1.0in"/>
   </fo:simple-page-master>
   <!-- sequence master -->
   <fo:page-sequence-master master-name="contents">
     <fo:repeatable-page-master-reference
         master-reference="first"
         maximum-repeats="1"/>
      <fo:repeatable-page-master-reference
         master-reference="subsequent"/>
   </fo:page-sequence-master>


</fo:layout-master-set>

Generate a page Sequence for each Chapter

For each chapter in our source document we create a page sequence that formats the first page of the chapter using the "first" page master, and all subsequent pages using the "subsequent" page master:

<xsl:for-each select = "//chapter">
   <fo:page-sequence master-reference="contents">
   <!-- put chapter #, title in header -->
   <fo:static-content flow-name="xsl-region-before">
      <fo:block>
         <fo:inline font-style="italic" text-align="center">
            Chapter
            <xsl:value-of select = "@number"/>:
            <xsl:value-of select = "@title"/>
         </fo:inline>
      </fo:block>
   </fo:static-content>
   <!-- put chp title, page, chp # in footer -->
   <fo:static-content flow-name="xsl-region-after">
      <fo:block>
         <fo:inline font-style="italic" text-align="start">
            Chapter
            <xsl:value-of select = "@number"/>
         </fo:inline>
         <fo:inline text-align="center">
            page-<fo:page-number/>
         </fo:inline>
         <fo:inline text-align="end">
            <xsl:value-of select = "@title"/>
         </fo:inline>
      </fo:block>
   </fo:static-content>

   <fo:flow flow-name="xsl-region-body">
      <!-- output chapter title & # -->
      <fo:block font-weight="bold" font-size="16pt"
         font-family="Arial, Helvetica, sans"> 
         <fo:inline text-align="center">
            Chapter
            <xsl:value-of select = "@number"/>:
            <xsl:value-of select = "@title"/>
         </fo:inline>
      </fo:block>
      <!-- output chapter paragraphs -->
      <xsl:apply-templates select="para"/>
   </fo:flow>
</fo:page-sequence>

Each page will contain the chapter name and number in the header and footer, as well as the page number in the footer. In addition to actual content from the source document, the first page begins with a formatted title.

Generating the FOB and the Document

Here's the command line that generates book.fob:

C:\pearce\web\xml\xslfo>xalan chapters.xml book.xsl book.fob
C:\pearce\web\xml\xslfo>java org.apache.xalan.xslt.Process -in chapters.xml -xsl book.xsl -out book.fob

Here's the command line that generates book.pdf:

C:\pearce\web\xml\xslfo>xep -fo book.fob

(document (validate [validation OK]) [system-id file:/C%3A/pearce/web/xml/xslfo/book1.fob] (compile (masters (sequence-master [master-name first])(sequence-master [master-name subsequent])(sequence-master [master-name contents])) (sequence [master-reference contents](static-content [flow-name xsl-region-before])(static-content [flow-name xsl-region-after])(flow [flow-name xsl-region-body]))(sequence [master-reference contents](static-content [flow-name xsl-region-be fore])(static-content [flow-name xsl-region-after])(flow [flow-name xsl-region-b ody]))) (format (sequence [master-reference contents](flow [page-number 1][page-number 2]) (static-content [page-number 1][region-name xsl-region-before][region-name xsl-r egion-after][page-number 2][region-name xsl-region-before][region-name xsl-regio n-after]) ) (sequence [master-reference contents](flow [page-number 3][page-number 4][page-n umber 5][page-number 6][page-number 7][page-number 8][page-number 9]) (static-content [page-number 3][region-name xsl-region-before][region-name xsl-r egion-after][page-number 4][region-name xsl-region-before][region-name xsl-regio n-after][page-number 5][region-name xsl-region-before][region-name xsl-region-af ter][page-number 6][region-name xsl-region-before][region-name xsl-region-after] [page-number 7][region-name xsl-region-before][region-name xsl-region-after][pag e-number 8][region-name xsl-region-before][region-name xsl-region-after][page-nu mber 9][region-name xsl-region-before][region-name xsl-region-after]) )) (generate [output-format pdf][page-number 1][page-number 2][page-number 3][page- number 4][page-number 5][page-number 6][page-number 7][page-number 8][page-numbe r 9]))

The Result

Our book.xml consisted of two chapters. Each chapter consisted of a long list. Each item in the list was marked as a separate paragraph. Here's the pdf result:

Document Structure (Abstract)