JAXP Source Types
This section is relevant to the Java platform only.
When a user application invokes Saxon via the Java API, then a source document is supplied as
an instance of the JAXP Source class. This is true whether invoking an XSLT
transformation, an XQuery query, or a free-standing XPath expression. The Source
class is essentially a marker interface. The Source that is supplied must be a
kind of Source that Saxon recognizes.
Saxon recognizes all three kinds of Source defined in JAXP: a
StreamSource, a SAXSource, and a DOMSource.
-
When using a
StreamSource, note:- A
StreamSourcethat wraps anInputStreamorReadercan only be used once: it is consumed by use. However, aStreamSourcethat wraps aFileor URI can be used multiple times. - Whoever creates an
InputStreamorReaderis responsible for closing it after use. This means that if Saxon creates anInputStreamfrom a suppliedFileor URI, it will close thatInputStreamafter use; but if theInputStreamis created by the calling application, then the calling application is responsible for closing it. (On some operating systems it is important not to leave unclosed streams lying around.) - If the
StreamSourcewraps anInputStreamorReader, then the base URI of the document is taken from theSystemIDproperty of theStreamSource. If this is not set, then the base URI is unknown, which may cause constructs that require a known base URI to fail.
- A
-
When using a
SAXSource, note:- If no
XMLReaderis supplied, Saxon will allocate one, based on settings in theConfiguration. - Processing of the contained
InputSourceis entirely the responsibility of the XML parser; Saxon is not involved in this. - Saxon will modify properties of the supplied
XMLReader: it will set theContentHandlerandLexicalHandlerso that it can receive the output of parsing, and it will set theErrorHandlerso it can handle parsing errors. - Saxon makes no attempt to ensure that processing of a
SAXSourceor its underlyingXMLReaderis thread-safe. The sameXMLReadershould not be used concurrently in multiple threads.
- If no
-
When using a
DOMSource, note:- The DOM is not thread-safe, even when used in read-only mode. Saxon therefore synchronizes all its access to DOM methods. However, that's no protection if there are application threads accessing the DOM that aren't using Saxon.
- The base URI
of the document is taken from the
SystemIDproperty of theDOMSource. If this is not set, then the base URI is unknown, which may cause constructs that require a known base URI to fail. - From Saxon 9.8, Saxon-EE uses a new mechanism for processing DOM trees, called the Domino model. This involves creating
an index of all the nodes in the DOM, providing for faster navigation. Saxon-PE and Saxon-HE continue to use the DOM
NodeWrappermodel, where DOM methods are used to navigate the tree. A transformation using the Domino model takes typically twice as long as Saxon's nativeTinyTree, while theNodeWrappermodel can take 5 to 10 times as long. An alternative approach is to convert the DOM tree to aTinyTreebefore the transformation starts. Even better: don't use DOM in the first place.
Other kinds of Source that are recognized by most Saxon interfaces are:
-
TreeInfo: Saxon'sTreeInfoholds information about a document (or more generally any tree of nodes), and can be used directly as aSourceof a transformation. -
NodeInfo: Saxon'sNodeInforepresents a node in a tree, and can be used directly as aSourceof a transformation. -
StaxSource: allows a pull parser to be used. -
PullSource: Saxon's internal pull interface. -
EventSource: Similar to anXMLReader,but with a much simpler interface, anEventSourcehas asend()method that sends a stream of events to a SaxonReceiver. -
SaplingDocument: a sapling tree constructed using the sapling construction interface can be used anywhere (within Saxon) that aSourceis expected.
Saxon also accepts input from an XMLStreamReader
(javax.xml.stream.XMLStreamReader), that is a StAX pull parser as defined in
JSR 173. This is achieved by creating an instance of net.sf.saxon.pull.StaxBridge, supplying the
XMLStreamReader using the setXMLStreamReader() method, and
wrapping the StaxBridge object in an instance of net.sf.saxon.pull.PullSource, which implements the
JAXP Source interface and can be used in any Saxon method that expects a
Source. Saxon has been validated with two StAX parsers: the Zephyr parser from
Sun (which is supplied as standard with JDK 1.6), and the open-source Woodstox parser from
Tatu Saloranta. In Saxonica's experience, Woodstox is the more reliable of the two. However, there is
no immediate benefit in using a pull parser to supply Saxon input rather than a push parser;
the main use case for using an XMLStreamReader is when the data is supplied from
some source other than parsing of lexical XML.
Nodes in Saxon's implementation of the XPath data model are represented by the interface NodeInfo. A NodeInfo is
itself a Source, which means that any method in the API that requires a source
object will accept any implementation of NodeInfo. As discussed in the next
section, implementations of NodeInfo are available to wrap Axiom, DOM, DOM4J,
JDOM2, or XOM nodes, and in all cases these wrapper objects can be used wherever a
Source is required.
Saxon also provides a class net.sf.saxon.lib.AugmentedSource which implements the Source interface.
This class encapsulates one of the standard Source objects, and allows additional
processing options to be specified. These options include whitespace handling, schema and DTD
validation, XInclude processing, error handling, choice of XML parser, and choice of Saxon
tree model.
Saxon allows additional Source types to be supported by registering a SourceResolver with the Configuration object. The task of a
SourceResolver is to convert a Source that Saxon does not
recognize into a Source that it does recognize. For example, this may be done by
building the document tree in memory and returning the NodeInfo object representing the root of the tree.