Tech Ads
Back to Article List
Originally published February 2004 [ Publisher Link ]
XML to PDF conversion through FOP
Even though XML is in wide use in a vast array of applications today, it is often criticized for its lack of presentation and aesthetic features -- with good reason, since these were not its designers primary purposes. Sometimes it's useful to convert XML information into a more user-friendly format like PDF. In this article we will describe how to go about this process using Formatting Objects Processor (FOP), an Apache Software Foundation project.
FOP is not in itself a PDF conversion tool exclusively, but a broader project that takes a W3C standard XSL-FO tree and renders its content to another format, such as PCL, PS, SVG, and of course PDF, among others.
We will also be using Ant, another Apache project, to ease our conversion process by expressing it in a simple configuration script. Download FOP and Ant in their binary editions and let's get started.
We will begin by illustrating an XML document to be converted into PDF:
<?xml version="1.0"?> <linuxdistros> <distro> <name>Debian</name> <codename>Woody</codename> </distro> <distro> <name>Redhat</name> <codename>Fedora</codename> </distro> <distro> <name>Suse</name> <codename>Suse</codename> </distro> </linuxdistros> |
To begin, we need to transform this XML fragment into an XSL-FO tree. The natural choice for this step is an XSL stylesheet, since it allows us to define specific conversion instructions for each XML element. Here is the XSL stylesheet for this task:
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" version="1.0"> <xsl:template match="/"> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <!-- Define Layout for Master Page --> <fo:layout-master-set> <fo:simple-page-master master-name="onlypage" page-height="29.7cm" page-width="21cm" margin-top="1cm" margin-bottom="2cm" margin-left="2.5cm" margin-right="2.5cm"> <fo:region-body margin-top="3cm"/> <fo:region-before extent="3cm"/> <fo:region-after extent="1.5cm"/> </fo:simple-page-master> </fo:layout-master-set> <!-- Start sequence for File --> <fo:page-sequence master-reference="onlypage"> <fo:flow flow-name="xsl-region-body"> <!-- Define a Top Level Header --> <fo:block font-size="18pt" font-family="sans-serif" line-height="24pt" space-after.optimum="15pt" background-color="black" color="white" text-align="center" padding-top="3pt" > Linux Distros </fo:block> <xsl:apply-templates/> </fo:flow> </fo:page-sequence> </fo:root> </xsl:template> <xsl:template match="linuxdistros"> <fo:table border-width="0.5mm" border-style="solid"> <fo:table-column column-width="3cm"/> <fo:table-column column-width="3cm"/> <fo:table-body> <fo:table-row> <fo:table-cell border-width="0.5pt" border-style="solid"> <fo:block text-align="center"> Name </fo:block> </fo:table-cell> <fo:table-cell border-width="0.5pt" border-style="solid"> <fo:block text-align="center"> Code Name </fo:block> </fo:table-cell> </fo:table-row> <xsl:apply-templates/> </fo:table-body> </fo:table> </xsl:template> <xsl:template match="distro"> <fo:table-row border-width="0.5pt" border-style="solid"> <xsl:apply-templates/> </fo:table-row> </xsl:template> <xsl:template match="name"> <fo:table-cell ><fo:block text-align="center"> <xsl:value-of select="."/> </fo:block></fo:table-cell> </xsl:template> <xsl:template match="codename"> <fo:table-cell><fo:block text-align="center"> <xsl:value-of select="."/> </fo:block></fo:table-cell> </xsl:template> </xsl:stylesheet> |
For those of you who have never used an XSL stylesheet, each <xsl:template>
defines the output which is to be generated for each XML element. For example, when a <linuxdistros>
element is encountered it is supplanted with the contents of its template. This process occurs recursively for each XML element, and it's from the contents of these templates that another document is constructed that will represent our XSL-FO tree.
The XSL-FO elements used to construct our tree are the most basic in nature. Although FOP does support a wide range of XSL-FO elements, it does not support all of them as defined in the W3C's specification. This is one of the reasons FOP is still in its x.2 release. Some of the XSL-FO declarations which are used on our conversion process include:
-
fo:root
: Indicates the start of the XSL-FO tree as well as the XML namespace. -
fo:layout-master-set
: Used to define the general characteristics of all PDF pages, its attributes are self-explanatory and include margins and page width and height, among others. -
fo:flow
: Defines the beginning of the document body. -
fo:block
: Afo:block
is used to define content. It can take various attributes, such as font type, alignment properties, or font colors. It is similar to a<p>
element in HTML / XHTML. -
fo:table
: Declares the start of a table. -
fo:table-column
: Indicates the sizes of columns in a given table. -
fo:table-row
: Declares the start of a row in a given table, similar to the<tr>
element in HTML / XHTML. -
fo:table-cell
: Defines a cell for a given table, similar to the<td>
element in HTML / XHTML.
FOP does offer fancier formatting elements, but for simplicity's sake we do not address them here. Consult FOP's documentation to discover its additional formatting capabilities.
Next, we need to prepare the Ant script in charge of the actual conversion. Delving into the finer details of Ant goes beyond the scope of this article; if you have never used it before you can read First contact with planet Ant to get up to speed. Our build script looks like:
<?xml version="1.0"?> <project default="init" basedir="."> <!-- ================================================--> <!-- SET CLASSPATH --> <!-- ================================================--> <path id="classpath"> <fileset dir="./lib"> <include name="*.jar"/> </fileset> </path> <!-- ================================================--> <!-- DEFINE PDF TARGET --> <!-- ================================================--> <target name="pdf"> <echo message="--- Transforming XML to PDF ---"/> <taskdef name="fop" classname="org.apache.fop.tools.anttasks.Fop"> <classpath refid="classpath"/> </taskdef> <xslt in="linuxdistros.xml" style="linuxdistros2pdf.xsl" out="linuxdistros.fo" destdir="."/> <fop fofile="linuxdistros.fo" outfile="./linuxdistros.pdf"/> </target> </project> |
Wherever you place your build.xml file, you also need to create a directory named lib
to hold FOP's libraries: The fop.jar
file and its other ancillary JARs. These files are included in the FOP download and are located in the build
and lib
directories respectively.
Our Ant script first defines a path
instruction to load all of FOP's libraries, which are located under the lib
directory. We then define our main target, named pdf
. The initial declaration of our target defines a task named fop
on the org.apache.fop.tools.anttasks.Fop
class. This class contains the actual logic for converting the XSL-FO tree into PDF.
Immediately after, we declare an xslt
task, which takes two input parameters: in=linuxdistros.xml
, where linuxdistros.xml
corresponds to our XML file, and style="linuxdistros2pdf.xsl"
, where linuxdistros2pdf.xsl
is our XSL stylesheet. The output for this task, declared as out=linuxdistros.fo
, indicates that the generated output be placed in a file named linuxdistros.fo
, which represents our XSL-FO tree.
The last line defines the fop
task that takes the XSL-FO tree linuxdistros.fo
and generates the PDF file named linuxdistros.pdf
.
Finally, to generate the PDF document, simply execute the ant pdf
command from your shell prompt.
You can now streamline the process of creating PDF documents directly from XML, with the help of these open source-based tools.