Building a Source Document from lexical XML

The conversion of lexical XML to a tree in memory is called parsing, and is performed by a software component called an XML Parser. Saxon does not include its own XML parser, rather it provides interfaces that invoke XML parsers supplied by third parties. Platforms such as Java and .NET typically include a built-in XML parser that Saxon uses by default.

With the Java s9api interface, a source document can be built using the DocumentBuilder class, which is created using the factory method newDocumentBuilder on the Processor object. Various options for document building are available as methods on the DocumentBuilder, for example options to perform schema or DTD validation, to strip whitespace, to expand XInclude directives, and also to choose the tree implementation model to be used.

These methods create a document from a Source object. This is a JAXP interface designed as an abstraction of various kinds of XML source, including StreamSource, which represents lexical XML held in a file or input stream; SAXSource, which represents a source of SAX events; DOMSource, representing an already-parsed XML document held in a DOM tree; and StAXSource, which represents a class that responds to requests for STAX (pull-parser) events. In addition, Saxon's NodeInfo and TreeInfo classes implements the JAXP Source interface, and the s9api XdmNode class has an asSource() method, so it is always possible to supply an existing Saxon tree as the source for any of these interfaces.

Similarly in the .NET API, there is a DocumentBuilder object that can be created from the Processor. This allows options to be set controlling the way documents are built, and provides an overloaded Build method allowing a tree to be built from various kinds of source.

It is also possible to build a Saxon tree in memory by using the buildDocumentTree() method of the Configuration object. (When using the JAXP Transformation API, the Configuration can be obtained from the TransformerFactory as the value of the attribute named Feature.CONFIGURATION.name.)

The buildDocumentTree() method takes a single argument, a JAXP Source. This can be any of the standard kinds of JAXP Source. See JAXP Sources for more information. The method returns a TreeInfo containing information about the constructed tree, notably the method getRootNode() to get the root node of the tree, which in most cases will be a document node.

All the documents processed in a single transformation or query must be loaded using the same Configuration. However, it is possible to copy a document from one Configuration into another by supplying the TreeInfo at the root of the existing document as the Source supplied to the buildDocumentTree() method of the new Configuration.