Third-party object models: Axiom, DOM, JDOM2, XOM, and DOM4J

As well as implementing its own tree models, Saxon also allows XPath access to a number of third-party XML tree representations.

Access to DOM trees is supported both in SaxonJ and SaxonCS.

The remainder of this section is relevant to SaxonJ only.

In the case of DOM, all Saxon editions support DOM access "out of the box", and no special configuration action is necessary. See also The Domino tree model.

Support for Axiom, JDOM2, XOM, and DOM4J is not available "out of the box" with Saxon-HE, but the source code is open source (in sub-packages of net.sf.saxon.option) and can be compiled for use with Saxon-HE if required.

The support code for Axiom, DOM4J, JDOM2, and XOM is integrated into the main JAR files for SaxonJ-PE and SaxonJ-EE, but (unlike the case of DOM) it is not activated unless the object model is registered with the Configuration. To activate support for one of these models, the implementation must either be included in the relevant section of the configuration file, or it must be nominated to the configuration using the method registerExternalObjectModel().

Each supported object model is represented in Saxon by a TreeModel object, which in the case of external object models will also be an instance of ExternalObjectModel. The TreeModel can be used to get a Builder, which can then be used to construct an instance of the model from SAX input. The Builder can also be inserted into a pipeline to capture the output of a transformation or query.

For DOM input, the source can be supplied by wrapping a DOMSource around the DOM Document node. For Axiom, JDOM2, XOM, and DOM4J the approach is similar, except that the wrapper classes are supplied by Saxon itself: they are net.sf.saxon.option.axiom.AxiomDocument, net.sf.saxon.option.jdom2.JDOM2DocumentWrapper, net.sf.saxon.option.xom.XOMDocumentWrapper, and net.sf.saxon.option.dom4j.DOM4JDocumentWrapper respectively. These wrapper classes implement the Saxon NodeInfo interface (which means that they also implement Source).

Saxon supports these models by wrapping each external node in a wrapper that implements the Saxon NodeInfo interface. When nodes are returned by the XQuery or XPath API, these wrappers are removed and the original node is returned. Similarly, the wrappers are generally removed when extension functions expecting a node are called.

Saxon does not support wrapping of an external tree that contains entity reference nodes. Most parsers provide an option to avoid constructing a tree that contains such nodes. For example, with the JDK Xerces DOM parser, use DOMParser dp = new DOMParser(); dp.setFeature("http://apache.org/xml/features/dom/create-entity-ref-nodes", expandEntities);. If there is a need to process a tree that does contain entity references, it should be copied to a Saxon tree. (Note, this only affects entities explicitly declared in a DTD. It does not affect character references or built-in entity references such as <, which never appear as entity reference nodes in the tree.)

In the case of DOM only, Saxon also supports a wrapping the other way around: an object implementing the DOM interface may be wrapped around a Saxon NodeInfo. This is done when Java methods expecting a DOM Node are called as extension functions, if the NodeInfo is not itself a wrapper for a DOM Node.

You can also send output to a DOM by using a DOMResult, or to a JDOM2 tree by using a JDOM2Result, or to a XOM document by using a XOMWriter. In such cases it is a good idea to set saxon:require-well-formed="yes" on xsl:output to ensure that the transformation or query result is a well-formed document (for example, that it does not contain several elements at the top level).

In some cases external object models do not fully support the XDM (XPath data model), for example:

  • They may allow nodes to be created with invalid names, or with namespace prefixes that are not declared.

  • Many of them have restrictions concerning the recognition of ID and IDREF attributes.

  • In most cases they do not allow "namespace undeclarations" as permitted in XML 1.1 (so a prefix that is in-scope for a parent element will always be in-scope for its child elements).

  • None of the external object models support typed (schema-validated) data.

  • None of the external object models support in-situ update using XQuery Update.

  • Many of the external object models allow a document to contain adjacent text nodes, especially when the tree is constructed programmatically rather than as the result of XML parsing.

    In the case of DOM and JDOM2, the Saxon wrapper code creates a view of the tree in which multiple adjacent external text nodes are presented as a single composite XDM text node. But this logic is not implemented for other external models: in such cases the XDM tree may violate the constraint that adjacent text nodes are not allowed.

    In the case of DOM4J, a tree with adjacent text nodes may be created even when the tree is built directly by parsing XML using a org.dom4j.io.SAXReader, unless the option SAXReader.setMergeAdjacentTextNodes(true) is set prior to parsing.