The PTree File Format
Saxon-SA supports a file format called the PTree (persistent tree). This is a binary representation of an
XML document. The PTree file is generally about the same size as the original document (perhaps 10% smaller),
but it typically loads in about half the time. Storing a document as a PTree can therefore give a useful
performance improvement when the same source document is used repeatedly as the input to many queries
or transformations. Another benefit of the PTree is that it retains any type information that is present,
which means that the document does not need to be validated against its schema each time it is loaded. (The
schema, however, must be loaded whenever the document is loaded.)
Two commands are available for converting XML documents into PTree files and vice versa. To create
a PTree, use:
java � com.saxonica.ptree.PTreeWriter source.xml result.ptree
-strip causes all whitespace-only text nodes to be stripped in the process,
which will often give a useful saving in space and therefore in loading time.
To convert a PTree back to an XML document, use:
java � com.saxonica.ptree.PTreeReader source.ptree result.xml
It is possible to apply a query or transformation directly to a PTree by specifying the
option on the command line for
This option actually causes a different URIResolver, the
PTreeURIResolver, to be used in
place of the standard URIResolver. The
PTreeURIResolver recognizes any URI ending in the
.ptree as an identifier for a file in PTree format. This extends to files loaded using the
document() functions: if the file extension is
file will be assumed to be in PTree format.
The result of a query or transformation can be serialized as a PTree file by specifying
as the output method, where the namespace prefix
saxon represents the URI
The PTree format is designed to allow future Saxon releases to read files created using older releases. The
converse may not always be true: it might sometimes be impossible for release N to read a PTree file created
using release N+1.
The PTree format does not retain the base URI of the original file: when a PTree is loaded, the base URI is taken
as the URI of that file, not the original XML file. The PTree is a serialization of the XPath data model, so information
that isn't present in the data model will not be present in the PTree: for example, it will have no DTD and no
entity references or CDATA sections.
References to unparsed entities are not currently retained in a PTree.