Saxon Documentation

Full Contents

About Saxon

Changes in this Release

Licensing

Using XSLT 2.0

Using XQuery

Handling Source Documents
	Handling Source Documents
	Source Documents on the Command Line
	Collections
	Building a Source Document from an application
	Preloading shared reference documents
	Writing input filters
	XInclude processing
	Saxon and XML 1.1
	JAXP Source Types
	Third-party Object Models: DOM, JDOM, XOM, and DOM4J
	Choosing a Tree Model
»	The PTree File Format
	Validation of Source Documents
	Whitespace Stripping in Source Documents
	Streaming of Large Documents
	Document Projection

XML Schema Processing

XPath API for Java

Saxon on .NET

Extensibility

Saxon Extensions

Sample Saxon Applications

The Saxon SQL Extension

XSLT Elements

XPath 2.0 Expression Syntax

Function Library

Standards Conformance

The PTree File Format

Saxon-SA supports a file format called the PTree (persistent tree). This is a binary representation of an XML document. The PTree file is generally about the same size as the original document (perhaps 10% smaller), but it typically loads in about half the time. Storing a document as a PTree can therefore give a useful performance improvement when the same source document is used repeatedly as the input to many queries or transformations. Another benefit of the PTree is that it retains any type information that is present, which means that the document does not need to be validated against its schema each time it is loaded. (The schema, however, must be loaded whenever the document is loaded.)

Two commands are available for converting XML documents into PTree files and vice versa. To create a PTree, use:

java com.saxonica.ptree.PTreeWriter source.xml result.ptree

The option -strip causes all whitespace-only text nodes to be stripped in the process, which will often give a useful saving in space and therefore in loading time.

To convert a PTree back to an XML document, use:

java com.saxonica.ptree.PTreeReader source.ptree result.xml

It is possible to apply a query or transformation directly to a PTree by specifying the -p option on the command line for com.saxonica.Transform or com.saxonica.Query. This option actually causes a different URIResolver, the PTreeURIResolver, to be used in place of the standard URIResolver. The PTreeURIResolver recognizes any URI ending in the extension .ptree as an identifier for a file in PTree format. This extends to files loaded using the doc() or document() functions: if the file extension is .ptree, the file will be assumed to be in PTree format.

The result of a query or transformation can be serialized as a PTree file by specifying saxon:ptree as the output method, where the namespace prefix saxon represents the URI http://saxon.sf.net/.

The PTree format is designed to allow future Saxon releases to read files created using older releases. The converse may not always be true: it might sometimes be impossible for release N to read a PTree file created using release N+1.

The PTree format does not retain the base URI of the original file: when a PTree is loaded, the base URI is taken as the URI of that file, not the original XML file. The PTree is a serialization of the XPath data model, so information that isn't present in the data model will not be present in the PTree: for example, it will have no DTD and no entity references or CDATA sections.

References to unparsed entities are not currently retained in a PTree.