Building a Source Document from lexical XML
The conversion of lexical XML to a tree in memory is called parsing, and is performed by a software component called an XML Parser. Saxon does not include its own XML parser, rather it provides interfaces that invoke XML parsers supplied by third parties. Platforms such as Java and .NET typically include a built-in XML parser that Saxon uses by default.
With the Java s9api interface, a source document can be built using the DocumentBuilder class, which is created using
the factory method
newDocumentBuilder on the Processor object. Various options for document
building are available as methods on the
DocumentBuilder, for example options to
perform schema or DTD validation, to strip whitespace, to expand XInclude directives, and also
to choose the tree implementation model to be used.
These methods create a document from a
Source object. This is a JAXP interface designed
as an abstraction of various kinds of XML source, including
StreamSource, which represents lexical XML
held in a file or input stream;
SAXSource, which represents a source of SAX events;
representing an already-parsed XML document held in a DOM tree; and
StAXSource, which represents a
class that responds to requests for STAX (pull-parser) events. In addition, Saxon's NodeInfo and TreeInfo classes
implements the JAXP
Source interface, and the s9api XdmNode class has an
so it is always possible to supply an existing Saxon tree as
the source for any of these interfaces.
Similarly in the .NET API, there is a DocumentBuilder object that can be created from the Processor. This allows options to be set controlling the way
documents are built, and provides an overloaded
Build method allowing a tree to
be built from various kinds of source.
It is also possible to build a Saxon tree in memory by using the
method of the Configuration object.
(When using the JAXP Transformation API, the
Configuration can be obtained from
TransformerFactory as the value of the attribute named Feature.CONFIGURATION.name.)
method takes a single argument, a JAXP
Source. This can be any of the standard
kinds of JAXP
Source. See JAXP
Sources for more information. The method returns a TreeInfo containing information about the constructed tree,
notably the method
getRootNode() to get the root node of the tree,
which in most cases will be a document node.
All the documents processed in a single transformation or query must be loaded using the same
Configuration. However, it is
possible to copy a document from one
Configuration into another by supplying the
TreeInfo at the root of the
existing document as the
Source supplied to the
method of the new