Binary XML: the PTree file
Saxon-SA 8.5 allows an XML document to be saved on disk in a format referred to as a PTree.
This is a binary format designed for speed of loading. A document in PTree format takes about the
same amount of disk space as the original source XML, but takes about half as long to load into
memory. The saving is greater when the document contains type information, because this is retained
in the PTree without the need to revalidate.
Two new commands are available,
com.saxonica.ptree.PTreeReader to convert XML documents into PTrees and vice
A PTree can be supplied as the input to a transformation or query using the class
which implements the JAXP
A new command-line option is available on the commands
com.saxonica.Query. The option
-p causes a URIResolver to be used
that recognizes the file extension
.ptree as representing a Saxon PTree. This option
implicitly switches on the
-u option, meaning that the source file name is interpreted
as a URI. The
PTreeURIResolver, as well as recognising the
.ptree file extension, also
recognizes query parameters at the end of a URI. In particular it recognizes the parameters
validation=strip which control how a source
document is schema-validated. For example,
doc('source.xml?validation=lax') loads a source
document with lax validation. This option allows different validation to be applied to different source
documents loaded by a single query or transformation.
The result of a query or transformation can be serialized as a PTree by specifying
as the serialization
method. From the command line, use the parameter
The PTree format has been designed so that one Saxon release should normally be able to
read PTree files created by an earlier release. It may not always be possible, however, to read
PTrees created using a later Saxon release. The PTree is not dependent on any particular NamePool,
and can be freely moved between different machines just as source XML can. It is a binary format,
so there is no dependency on any particular character encoding or machine architecture. PTree files
are not designed to be read or written directly by user applications, nor are they designed to
provide an interchange format between Saxon and other products: the internal format is therefore
When a PTree contains type information, the schema that defines those types must also be loaded.
This doesn't happen automatically. At present, there is no way of storing a compiled schema on disk, so
this will generally involve rebuilding the schema from its source representation. It is the user's responsibility
to ensure that the loaded schema is consistent with the schema that was used to validate the original
For more information see PTree Files.