saxonica.com

JAXP Source Types

This section is relevant to the Java platform only.

When a user application invokes Saxon via the Java API, then a source document is supplied as an instance of the JAXP Source class. This is true whether invoking an XSLT transformation, an XQuery query, or a free-standing XPath expression. The Source class is essentially a marker interface. The Source that is supplied must be a kind of Source that Saxon recognizes.

Saxon recognizes the three kinds of Source defined in JAXP: a StreamSource, a SAXSource, and a DOMSource. If a DOMSource is to be used, then the Saxon DOM support classes in saxon9-dom.jar must be on the class path.

Note that the Xerces DOM implementation is not thread-safe, even for read-only access. Never use a DOMSource in several threads concurrently, unless you have checked that the DOM implementation you are using is thread-safe.

Saxon also accepts input from an XMLStreamReader (javax.xml.stream.XMLStreamReader), that is a StAX pull parser as defined in JSR 173. This is achieved by creating an instance of net.sf.saxon.pull.StaxBridge, supplying the XMLStreamReader using the setXMLStreamReader() method, and wrapping the StaxBridge object in an instance of net.sf.saxon.pull.PullSource, which implements the JAXP Source interface and can be used in any Saxon method that expects a Source. Saxon has been validated with two StAX parsers: the Zephyr parser from Sun (which is supplied as standard with JDK 1.6, and the open-source Woodstox parser from Tatu Saloranta. In my experience, Woodstox is the more reliable of the two. However, there is no immediate benefit in using a pull parser to supply Saxon input rather than a push parser; the main use case for using an XMLStreamReader is when the data is supplied from some source other than parsing of lexical XML.

Nodes in Saxon's implementation of the XPath data model are represented by the interface net.sf.saxon.NodeInfo. A NodeInfo is itself a Source, which means that any method in the API that requires a source object will accept any implementation of net.sf.saxon.NodeInfo. As discussed in the next section, implementations of NodeInfo are available to wrap DOM, DOM4J, JDOM, or XOM nodes, and in all cases these wrapper objects can be used wherever a Source is required.

Saxon also provides a class net.sf.saxon.AugmentedSource which implements the Source interface. This class encapsulates one of the standard Source objects, and allows additional processing options to be specified. These options include whitespace handling, schema and DTD validation, XInclude processing, error handling, choice of XML parser, and choice of Saxon tree model.

Saxon allows additional Source types to be supported by registering a SourceResolver with the Configuration object. The task of a SourceResolver is to convert a Source that Saxon does not recognize into a Source that it does recognize. For example, this may be done by building the document tree in memory and returning the NodeInfo object representing the root of the tree.

Next