Building XML Trees Programmatically

There are various ways in Saxon to build an XDM tree programmatically (that is, incrementally one node at a time).

The Sapling Tree API

A new API offered from Saxon 10 is the Sapling Tree API. This provides a collection of methods to create nodes; for example, to create a document containing a body element with two paragraphs, the expression

doc( elem("body") .child(elem("p").text("Hello"), elem("p").text("World")) )

might be used. These methods are found in package net.sf.saxon.sapling, specifically in the class net.sf.saxon.sapling.Saplings.

The "Sapling" nodes created by these methods are transient nodes used only during tree construction; when the Sapling tree has been completely built, it can be converted to a regular XDM tree offering full query access using the methods SaplingDocument.toXdmNode() or SaplingDocument.toNodeInfo(). It is also possible to send the tree directly to a Destination such as a Serializer, a SchemaValidator, or an Xslt30Transformer.

Sapling nodes are immutable objects, so operations like adding children or adding attributes always create a new object, without modifying the input objects. This means that adding a child element to a new parent can be done without an expensive copy operation. Nodes do not have references to their parents in the tree, so a subtree can be shared by multiple trees without copying.

The Sapling Tree API is described in the JavaDoc for class SaplingNode.

Event APIs

Saxon 10 introduces a new event-based API (called simply "Push") designed explicitly for convenient use by user-written applications.

A Push instance is always created using the factory method Processor.newPush(destination); the destination argument indicates what happens to the constructed document. This will commonly be an XdmDestination to build an in-memory XdmNode, or a Serializer to create lexical XML, but it could also be, for example, an XsltTransformer or a SchemaValidator.

Conventional event-based APIs such as the SAX ContentHandler and StAX XMLStreamWriter and XMLEventWriter rely on the application to issue a properly-nested sequence of calls to methods such as startElement() and endElement(). This can make it very difficult to diagnose errors if the calls are not properly matched. The Saxon Push API differs in that a call to start a new element node returns an Element object representing that element, and methods to create attributes and children for the element, and to end the element, are defined as methods on that Element object. Furthermore, these methods return the element to which they are applied, allowing method chaining. So a typical sequence of calls might be:

out.element("employee") .attribute("ssn", "123456") .attribute("location", "Berlin") .text("Helmut Schmidt") .close();

This example constructs a slightly more complex tree:

Processor processor = new Processor(false); Serializer destination = processor.newSerializer(new File("out.xml")); destination.setOutputProperty(Serializer.Property.INDENT, "no"); Push.Document doc = processor.newPush(destination).document(true); doc.setDefaultNamespace("http://www.example.org/ns"); Push.Element top = doc.element("root"); top.attribute("version", "1.5"); for (Employee emp : getData()) { top.element("emp") .attribute("ssn", emp.ssn) .text(emp.name); } doc.close();

Note that there are no explicit endElement events here; an end tag is written automatically when the next sibling is written to the parent element, or when the parent element is closed. The close() method is available, however, to close an element explicitly, which can be useful to avoid errors when the writing of elements is distributed across many classes and methods.

Saxon also allows trees to be communicated using other event-based APIs. In Java there are three such APIs worth considering:

Saxon's Receiver API
The SAX ContentHandler API
The StAX XMLStreamWriter API

The Receiver is efficient, but it is proprietary to Saxon, is prone to minor changes from one release to another, and is designed primarily for internal use rather than for direct use from applications.

The SAX ContentHandler API was designed primarily for communication from an XML parser to an application; it can be clumsy to use when the originator of events is something other than an XML parser.

The StAX XMLStreamWriter is probably the best of the three interfaces for most applications. Saxon's DocumentBuilder class offers a method newBuildingStreamWriter() which returns an XMLStreamWriter; the calling application can then use methods such as XMLStreamWriter.writeStartElement() and XmlStreamWriter.writeEndElement() to build the tree.

The trickiest part of this interface is probably the handling of namespaces. Saxon's implementation of the StAX interfaces takes into account not only the official Javadoc specifications (which in some respects are woefully inadequate), but also the unofficial interpretation of the specifications found at Understanding StAX: How to Correctly Use XMLStreamWriter.