Building a source document from lexical XML
The conversion of lexical XML to a tree in memory is called parsing, and is performed by a software component called an XML Parser. Saxon does not include its own XML parser, rather it provides interfaces that invoke XML parsers supplied by third parties. Platforms such as Java and .NET typically include a built-in XML parser that Saxon uses by default.
A source document can be built using the DocumentBuilder class, which is created using
the factory method
NewDocumentBuilder() in C#) on the Processor object. Various options for document
building are available as methods on the
DocumentBuilder, for example options to
perform schema or DTD validation, to strip whitespace, to expand XInclude directives, and also
to choose the tree implementation model to be used.
On Java, the different ways of supplying input to the
are represented using the JAXP
Source object. This is a JAXP interface designed
as an abstraction of various kinds of XML source, including:
StreamSource, which represents lexical XML held in a file or input stream.
SAXSource, which represents a source of SAX events.
DOMSource, representing an already-parsed XML document held in a DOM tree.
StAXSource, which represents a class that responds to requests for STAX (pull-parser) events.
- ActiveSource, a Saxon extension of the JAXP
Sourceinterface, allowing you to define your own kind of input source: all you need to do is implement the method
deliver(), which delivers the contents of the resource to a Saxon
- NodeInfo, representing a node in an XDM tree, implements
In addition, the s9api XdmNode class has an
so it is always possible to supply an existing Saxon tree as the source for any of these interfaces.
On .NET, the DocumentBuilder has an overloaded
allowing input to be supplied from sources of the following kinds:
Stream, containing lexical XML as a stream of bytes.
TextReader, containing lexical XML as a stream of characters.
Uri, which can be dereferenced to fetch a stream of bytes.
XmlNode, Microsoft's DOM implementation.
XmlReader, an XML parser, primed with a source of input.
All the documents processed in a single transformation or query must be loaded using the same
Configuration. However, it is
possible to copy a document from one
Configuration into another by supplying the
TreeInfo at the root of the
existing document as the
Source supplied to the
method of the new