An XSL-FO document is an XML document with a .fo or .fob extension that combines content with markup that formats that content.
When a formatter reads doc.fob, it instantiates a page based on a page master. It then fills the page with content from the page sequence. When this page is filled, the process repeats. The process terminates when there is no more content.
while (page-sequence has content) {
1. create a new page by instantiating a
master
2. fill the page with content from the
page-sequence
}
These are the
currently available XSL-FO processors. FOP and PassiveTeX are the free
processors, but the commercial products generally produce better output.
FOP is a
Java-based processor available free from the Apache XML Project. FOP can
produce usable output, but it is still under development and has some
limitations that prevent it from outputting production quality typesetting. It
eventually will be able to produce many forms of output beyond the current PDF.
PassiveTeX from
Sebastian Rahtz (http://www.tei-c.org.uk/Software/passivetex/) is a free
XSL-FO processor based on
XEP (written in
Java) is a commercial product from RenderX ( http://www.renderx.com/).
A commercial
product from Antenna House http://www.antennahouse.com/.
A commercial
product from Unicorn Enterprises SA http://www.unicorn-enterprises.com/.
For Windows only.
xmlroff (http://xmlroff.sourceforge.net/)
is written in C, and uses libxml2 and other GNOME libraries. It is an open
source project but is not yet a complete implementation.
Other XSL-FO processors are listed on the W3C's XSL information page.
The FO root element contains a set of page masters followed by one or more page sequences:
<?xml version="1.0"
encoding="iso-8859-1"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<!-- page masters go here -->
</fo:layout-master-set>
<fo:page-sequence>
<!-- document content goes here
-->
</fo:page-sequence>
<!-- etc. -->
</fo:root>
There is currently only one type of master page, which is a simple (i.e., rectangular) master page. The attributes of this element specify its page size, margin sizes, and name:
<fo:simple-page-master
master-name="US-Letter"
page-height="11in"
page-width="8.5in"
margin-top="0.5in"
margin-bottom="0.5in"
margin-left="0.5in"
margin-right="0.5in">
<!-- page regions go here -->
</fo:simple-page-master>
For example, we might define page masters for each of the different types of pages found in a book:
<fo:layout-master-set>
<fo:simple-page-master master-name="tocPage" ...>
...
</fo:simple-page-master>
<fo:simple-page-master master-name="firstPage" ...>
...
</fo:simple-page-master>
<fo:simple-page-master master-name="leftPage" ...>
...
</fo:simple-page-master>
<fo:simple-page-master master-name="rightPage" ...>
...
</fo:simple-page-master>
<fo:simple-page-master master-name="indexPage" ...>
...
</fo:simple-page-master>
</fo:layout-master-set>
A page master specifies up to five regions of a page:
We can specify the widths of the before and after regions, and the heights of the start and end regions. To avoid overlap with the body region, we commonly set them inside the margins, hence outside of the body:
Here is a sample page master with all five regions declared:
<fo:simple-page-master master-name="typical"
page-width="8.5in" page-height="11in"
margin-top="0.5in" margin-bottom="0.5in"
margin-left="0.5in" margin-right="0.5in">
<fo:region-body margin="1.0in"/>
<fo:region-before
extent="1.0in"/>
<fo:region-after extent="1.0in"/>
<fo:region-start extent="1.0in"/>
<fo:region-end extent="1.0in"/>
</fo:simple-page-master>
After the master set comes one or more page sequences:
<fo:page-sequence master-reference="MASTER-NAME">
<!-- title element (optional) -->
<!-- static text elements (optional)
-->
<!-- flow -->
</fo:page-sequence>
Titles (like the titles of HTML pages) are optional and don't seem to do much:
<fo:title> Title </fo:title>
A static text element appears on each page, such as a header and footer:
<fo:static-content flow-name="xsl-region-before">
<!-- Header block goes here -->
</fo:static-content>
<fo:static-content flow-name="xsl-region-after">
<!-- Footer block goes here -->
</fo:static-content>
A flow contains all of the non-repeating content of the document:
<fo:flow flow-name = "xsl-region-body">
<!-- content blocks go here -->
</fo:flow>
Paragraphs, lists, and tables are examples of block sructures:
<fo:block>
<!-- content & inline elements
go here -->
</fo:block>
Blocks are automatically divided into lines. An inline element allows us to format part of a line:
<fo:inline font-style="italic"
text-align="start">
<!-- content goes here -->
</fo:inline>
In this example we create test.fob, an XSL-FO document that contains the content and formatting of a simple document. A single page master called "typical" specifies all five non-overlapping areas. We use a formatter to generate a pdf file called test.pdf from test.fob:
<?xml version="1.0"
encoding="iso-8859-1"?>
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<fo:simple-page-master master-name="typical"
page-width="8.5in" page-height="11in"
margin-top="0.5in" margin-bottom="0.5in"
margin-left="0.5in" margin-right="0.5in">
<fo:region-body margin="1.0in"/>
<fo:region-before
extent="1.0in"/>
<fo:region-after extent="1.0in"/>
<fo:region-start extent="1.0in"/>
<fo:region-end extent="1.0in"/>
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence
master-reference="typical">
<fo:static-content
flow-name="xsl-region-start">
<fo:block> Left margin note
</fo:block>
</fo:static-content>
<fo:static-content
flow-name="xsl-region-before">
<fo:block> Header
</fo:block>
</fo:static-content>
<fo:static-content flow-name="xsl-region-after">
<fo:block> Footer
</fo:block>
</fo:static-content>
<fo:static-content
flow-name="xsl-region-end">
<fo:block> Right margin
note </fo:block>
</fo:static-content>
<fo:flow
flow-name="xsl-region-body">
<fo:block> Body Text
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
C:\pearce\web\xml\xslfo>xep -fo test.fob
(document (validate [validation OK])
[system-id file:/C%3A/pearce/web/xml/xslfo/test.fob]
(compile (masters (sequence-master [master-name typical]))
(sequence [master-reference typical](static-content [flow-name
xsl-region-start]
)(static-content [flow-name xsl-region-before])(static-content [flow-name
xsl-region-end])(static-content [flow-name xsl-region-after])(flow [flow-name
xsl-region-body])))
(format
(sequence [master-reference typical](flow [page-number 1])
(static-content [page-number 1][region-name xsl-region-start][region-name
xsl-region-after][region-name xsl-region-end][region-name xsl-region-before])
))
(generate [output-format pdf][page-number 1]))
In many situations a multi-page document will use different page masters for different pages. For example, we might define different masters for first pages, even pages, and odd pages. This is done by add page sequence masters to our page master element. For example, the following master set defines two page masters called "first" and "subsequent". It then adds a page sequence master called "contents" which is used by the following page sequence:
<fo:layout-master-set>
<fo:simple-page-master
master-name="first" ...>
...
</fo:simple-page-master>
<fo:simple-page-master
master-name="subsequent" ... >
...
</fo:simple-page-master>
<fo:page-sequence-master
master-name="contents">
<fo:repeatable-page-master-reference
master-reference="first"
maximum-repeats="1"/>
<fo:repeatable-page-master-reference
master-reference="subsequent"/>
</fo:page-sequence-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="contents">
...
</fo:page-sequence>
The page sequence master says:
use the "first" page master to format the first n pages, where n = 1. Use the "subsequent" master to format any remaining pages.
So far our XSL-FO documents have contained both, content and formatting. More typically, the content would be contained in a distinct XML document. In this case we create an XSLT document that outputs a new XML document containing both content and formatting. In our example, an XML document called chapters.xml contains our content: the chapters of a book we are writing. Another file called book.xsl will combine the content of chapters.xml with the desired formatting to produce book.fob:
xalan chapters.xml book.xsl book.fob
Finally, we use a formatter to transform book.fob into book.pdf:
xep -fo book.fob
A schema for an XML document containing an entire book might look something like this:
<book>
<front-matter>
<title> ... </title>
<introduction> ...
</introduction>
<contents> ...
</contents>
</front-matter>
<chapters>
<chapter number = "1"
title = "Chapter 1">
<!-- paragraphs, figures,
tables,
footnotes, etc. go here -->
</chapter>
<!-- more chapters -->
</chapters>
<end-matter>
<end-notes> ...
</end-notes>
<references>
<reference>
<title> ...
</title>
<author> ...
</author>
<publisher> ...
</publisher>
<date> ... </date>
</reference>
<!-- more references -->
</references>
<index> ...</index>
</end-matter>
</book>
We create a document called chapters.xml, that instantiates only the chapters element of our book schema:
<chapters>
<chapter number = "1"
title = "Formatting Elements">
<para> ... </para>
<!-- many more paragraphs -->
</chapter>
<chapter number = "2"
title = "Formatting Properties">
<para> ... </para>
<!-- many more paragraphs -->
</chapter>
</chapters>
The main template of our style sheet program declares an fo:root element containing a set of page masters and page sequence masters, as well as a page sequence for each chapter in the source document. A second template will be used to convert individual paragraphs within a chapter into fo:block elements:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:template match="/">
<fo:root
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<fo:layout-master-set>
<!-- page masters go here
-->
</fo:master-set>
<xsl:for-each select =
"//chapter">
<!-- page sequence goes
here -->
</xsl:for-each>
</fo:root>
</xsl:template>
<!-- output content of a paragraph: -->
<xsl:template match="para">
<fo:block>
<xsl:value-of
select="."/>
</fo:block>
</xsl:template>
</xsl:stylesheet>
The master set is hard-wired into our style sheet:
<fo:layout-master-set>
<!-- first page of chapter -->
<fo:simple-page-master master-name="first"
page-width="8.5in" page-height="11in"
margin-top="3in" margin-bottom="0.5in"
margin-left="0.5in" margin-right="0.5in">
<fo:region-body margin="1.0in"/>
<fo:region-before extent="3.0in"/>
<fo:region-after extent="0.5in"/>
<fo:region-start extent="1.0in"/>
<fo:region-end extent="1.0in"/>
</fo:simple-page-master>
<!-- subsequent chapter pages
-->
<fo:simple-page-master master-name="subsequent"
page-width="8.5in" page-height="11in"
margin-top="0.5in" margin-bottom="0.5in"
margin-left="0.5in" margin-right="0.5in">
<fo:region-body margin="1.0in"/>
<fo:region-before
extent="0.5in"/>
<fo:region-after extent="0.5in"/>
<fo:region-start extent="1.0in"/>
<fo:region-end extent="1.0in"/>
</fo:simple-page-master>
<!-- sequence master -->
<fo:page-sequence-master
master-name="contents">
<fo:repeatable-page-master-reference
master-reference="first"
maximum-repeats="1"/>
<fo:repeatable-page-master-reference
master-reference="subsequent"/>
</fo:page-sequence-master>
</fo:layout-master-set>
For each chapter in our source document we create a page sequence that formats the first page of the chapter using the "first" page master, and all subsequent pages using the "subsequent" page master:
<xsl:for-each select = "//chapter">
<fo:page-sequence
master-reference="contents">
<!-- put chapter #, title in
header -->
<fo:static-content
flow-name="xsl-region-before">
<fo:block>
<fo:inline
font-style="italic" text-align="center">
Chapter
<xsl:value-of select =
"@number"/>:
<xsl:value-of select =
"@title"/>
</fo:inline>
</fo:block>
</fo:static-content>
<!-- put chp title, page, chp #
in footer -->
<fo:static-content
flow-name="xsl-region-after">
<fo:block>
<fo:inline
font-style="italic" text-align="start">
Chapter
<xsl:value-of select =
"@number"/>
</fo:inline>
<fo:inline
text-align="center">
page-<fo:page-number/>
</fo:inline>
<fo:inline
text-align="end">
<xsl:value-of select =
"@title"/>
</fo:inline>
</fo:block>
</fo:static-content>
<fo:flow
flow-name="xsl-region-body">
<!-- output chapter title
& # -->
<fo:block
font-weight="bold" font-size="16pt"
font-family="Arial,
Helvetica, sans">
<fo:inline
text-align="center">
Chapter
<xsl:value-of select =
"@number"/>:
<xsl:value-of select =
"@title"/>
</fo:inline>
</fo:block>
<!-- output chapter paragraphs
-->
<xsl:apply-templates
select="para"/>
</fo:flow>
</fo:page-sequence>
Each page will contain the chapter name and number in the header and footer, as well as the page number in the footer. In addition to actual content from the source document, the first page begins with a formatted title.
Here's the command line that generates book.fob:
C:\pearce\web\xml\xslfo>xalan chapters.xml book.xsl
book.fob
C:\pearce\web\xml\xslfo>java org.apache.xalan.xslt.Process -in chapters.xml
-xsl book.xsl -out book.fob
Here's the command line that generates book.pdf:
C:\pearce\web\xml\xslfo>xep -fo book.fob
(document (validate [validation OK]) [system-id file:/C%3A/pearce/web/xml/xslfo/book1.fob] (compile (masters (sequence-master [master-name first])(sequence-master [master-name subsequent])(sequence-master [master-name contents])) (sequence [master-reference contents](static-content [flow-name xsl-region-before])(static-content [flow-name xsl-region-after])(flow [flow-name xsl-region-body]))(sequence [master-reference contents](static-content [flow-name xsl-region-be fore])(static-content [flow-name xsl-region-after])(flow [flow-name xsl-region-b ody]))) (format (sequence [master-reference contents](flow [page-number 1][page-number 2]) (static-content [page-number 1][region-name xsl-region-before][region-name xsl-r egion-after][page-number 2][region-name xsl-region-before][region-name xsl-regio n-after]) ) (sequence [master-reference contents](flow [page-number 3][page-number 4][page-n umber 5][page-number 6][page-number 7][page-number 8][page-number 9]) (static-content [page-number 3][region-name xsl-region-before][region-name xsl-r egion-after][page-number 4][region-name xsl-region-before][region-name xsl-regio n-after][page-number 5][region-name xsl-region-before][region-name xsl-region-af ter][page-number 6][region-name xsl-region-before][region-name xsl-region-after] [page-number 7][region-name xsl-region-before][region-name xsl-region-after][pag e-number 8][region-name xsl-region-before][region-name xsl-region-after][page-nu mber 9][region-name xsl-region-before][region-name xsl-region-after]) )) (generate [output-format pdf][page-number 1][page-number 2][page-number 3][page- number 4][page-number 5][page-number 6][page-number 7][page-number 8][page-numbe r 9]))
Our book.xml consisted of two chapters. Each chapter consisted of a long list. Each item in the list was marked as a separate paragraph. Here's the pdf result: