Class DocumentBuilder


  • public class DocumentBuilder
    extends java.lang.Object
    A document builder holds properties controlling how a Saxon document tree should be built, and provides methods to invoke the tree construction.

    This class has no public constructor. To construct a DocumentBuilder, use the factory method Processor.newDocumentBuilder().

    All documents used in a single Saxon query, transformation, or validation episode must be built with the same Configuration. However, there is no requirement that they should use the same DocumentBuilder.

    Sharing of a DocumentBuilder across multiple threads is not recommended. However, in the current implementation sharing a DocumentBuilder (once initialized) will only cause problems if a SchemaValidator is used.

    Since:
    9.0
    • Constructor Detail

      • DocumentBuilder

        protected DocumentBuilder​(Configuration config)
        Create a DocumentBuilder. This is a protected constructor. Users should construct a DocumentBuilder by calling the factory method Processor.newDocumentBuilder().
        Parameters:
        config - the Saxon configuration
    • Method Detail

      • getTreeModel

        public TreeModel getTreeModel()
        Get the tree model to be used for documents constructed using this DocumentBuilder. By default, the TinyTree is used (irrespective of the TreeModel set in the underlying Configuration).
        Returns:
        the tree model in use: typically one of the constants TreeModel.TINY_TREE, TreeModel.TINY_TREE_CONDENSED, or TreeModel.LINKED_TREE. However, in principle a user-defined tree model can be used.
        Since:
        9.2
      • setLineNumbering

        public void setLineNumbering​(boolean option)
        Say whether line and column numbering and is to be enabled for documents constructed using this DocumentBuilder. This has the effect that the line and column number in the original source document is maintained in the constructed tree, for each element node (and only for elements). The line and column number in question are generally the position at which the closing ">" of the element start tag appears.

        By default, line and column numbers are not maintained.

        Errors relating to document parsing and validation will generally contain line numbers whether or not this option is set, because such errors are detected during document construction.

        Line numbering is not available for all kinds of source: for example, it is not available when loading from an existing DOM Document.

        The resulting line and column numbers are accessible to applications using the XPath extension functions saxon:line-number() and saxon:column-number() applied to a node, or using the Java methods NodeInfo.getLineNumber() and NodeInfo.getColumnNumber()

        Line and column numbers are maintained only for element nodes; the line number returned for any other node will be that of the most recent element. For an element node, the line and column number are generally that of the closing angle bracket at the end of the start tag (this is what a SAX parser notifies)

        Parameters:
        option - true if line numbers are to be maintained, false otherwise.
      • isLineNumbering

        public boolean isLineNumbering()
        Ask whether line and column numbering is enabled for documents loaded using this DocumentBuilder.

        By default, line and column numbering is disabled.

        Line numbering is not available for all kinds of source: in particular, it is not available when loading from an existing DOM Document.

        The resulting line and column numbers are accessible to applications using the extension functions saxon:line-number() and saxon:column-number applied to a node, or using the Java methods NodeInfo.getLineNumber() and NodeInfo.getColumnNumber()

        Line and column numbers are maintained only for element nodes; the line number returned for any other node will be that of the most recent element. For an element node, the line number is generally that of the closing angle bracket at the end of the start tag (this is what a SAX parser notifies)

        Returns:
        true if line numbering is enabled
      • getSchemaValidator

        public SchemaValidator getSchemaValidator()
        Get the SchemaValidator used to validate documents loaded using this DocumentBuilder.
        Returns:
        the SchemaValidator if one has been set; otherwise null.
      • setDTDValidation

        public void setDTDValidation​(boolean option)
        Set whether DTD validation should be applied to documents loaded using this DocumentBuilder.

        By default, no DTD validation takes place.

        Parameters:
        option - true if DTD validation is to be applied to the document
      • isDTDValidation

        public boolean isDTDValidation()
        Ask whether DTD validation is to be applied to documents loaded using this DocumentBuilder
        Returns:
        true if DTD validation is to be applied
      • setWhitespaceStrippingPolicy

        public void setWhitespaceStrippingPolicy​(WhitespaceStrippingPolicy policy)
        Set the whitespace stripping policy applied when loading a document using this DocumentBuilder.

        If DTD or schema validation is applied, the only permitted setting is WhitespaceStrippingPolicy.IGNORABLE. Any other value results in an exception from the build(File) method

        Parameters:
        policy - the policy for stripping whitespace-only text nodes from source documents
      • getWhitespaceStrippingPolicy

        public WhitespaceStrippingPolicy getWhitespaceStrippingPolicy()
        Get the white whitespace stripping policy applied when loading a document using this DocumentBuilder.
        Returns:
        the policy for stripping whitespace-only text nodes
      • setBaseURI

        public void setBaseURI​(java.net.URI uri)
        Set the base URI of a document loaded using this DocumentBuilder.

        This is used for resolving any relative URIs appearing within the document, for example in references to DTDs and external entities.

        This information is required when the document is loaded from a source that does not provide an intrinsic URI, notably when loading from a Stream or a DOMSource. The value is ignored when loading from a source that does have an intrinsic base URI.

        Parameters:
        uri - the base URI of documents loaded using this DocumentBuilder. This must be an absolute URI.
        Throws:
        java.lang.IllegalArgumentException - if the baseURI supplied is not an absolute URI
      • getBaseURI

        public java.net.URI getBaseURI()
        Get the base URI of documents loaded using this DocumentBuilder when no other URI is available.
        Returns:
        the base URI to be used, or null if no value has been set.
      • setDocumentProjectionQuery

        public void setDocumentProjectionQuery​(XQueryExecutable query)
        Set a compiled query to be used for implementing document projection. The effect of using this option is that the tree constructed by the DocumentBuilder contains only those parts of the source document that are needed to answer this query. Running this query against the projected document should give the same results as against the raw document, but the projected document typically occupies significantly less memory. It is permissible to run other queries against the projected document, but unless they are carefully chosen, they will give the wrong answer, because the document being used is different from the original.

        The query should be written to use the projected document as its initial context item. For example, if the query is //ITEM[COLOR='blue'), then only ITEM elements and their COLOR children will be retained in the projected document.

        This facility is only available in Saxon-EE; if the facility is not available, calling this method has no effect.

        Parameters:
        query - the compiled query used to control document projection
        Since:
        9.3
      • build

        public XdmNode build​(javax.xml.transform.Source source)
                      throws SaxonApiException
        Load an XML document, to create a tree representation of the document in memory.
        Parameters:
        source - A JAXP Source object identifying the source of the document. This can always be a StreamSource or a SAXSource. Some kinds of Source are consumed by this method, and should only be used once.

        If a SAXSource is supplied, the XMLReader held within the SAXSource may be modified (by setting features and properties) to reflect the options selected on this DocumentBuilder.

        If the source is an instance of NodeInfo then the subtree rooted at this node will be copied (applying schema validation if requested) to create a new tree.

        Saxon also accepts an instance of StAXSource or PullSource, which can be used to supply a document that is to be parsed using a StAX parser.

        (9.8.0.5) This method now (once again) accepts an instance of AugmentedSource. If an AugmentedSource is supplied, the properties of the AugmentedSource take precedence over any properties set on this DocumentBuilder, which in turn take precedence over properties set at the Processor or Configuration level. The concept of "taking precedence" is explained more fully at ParseOptions.merge(ParseOptions)

        Returns:
        An XdmNode. This will be the document node at the root of the tree of the resulting in-memory document.
        Throws:
        java.lang.NullPointerException - if the source argument is null
        java.lang.IllegalArgumentException - if the kind of source is not recognized
        SaxonApiException - if any other failure occurs building the document, for example a parsing error
      • build

        public XdmNode build​(java.io.File file)
                      throws SaxonApiException
        Build a document from a supplied XML file
        Parameters:
        file - the supplied file
        Returns:
        the XdmNode representing the root of the document tree
        Throws:
        SaxonApiException - if any failure occurs retrieving or parsing the document
      • newBuildingContentHandler

        public BuildingContentHandler newBuildingContentHandler()
                                                         throws SaxonApiException
        Get an ContentHandler that may be used to build the document programmatically.
        Returns:
        a newly constructed BuildingContentHandler, which implements the ContentHandler interface. If schema validation has been requested for this DocumentBuilder, then the document constructed using the ContentHandler will be validated as it is written.

        Note that the returned ContentHandler expects namespace scopes to be indicated explicitly by calls to ContentHandler.startPrefixMapping(java.lang.String, java.lang.String) and ContentHandler.endPrefixMapping(java.lang.String).

        If the stream of events supplied to the ContentHandler does not constitute a well formed (and namespace-well-formed) document, the effect is undefined; Saxon may fail to detect the error, and construct an unusable tree.

        Throws:
        SaxonApiException - if any failure occurs
        Since:
        9.3
      • newBuildingStreamWriter

        public BuildingStreamWriterImpl newBuildingStreamWriter()
                                                         throws SaxonApiException
        Get an XMLStreamWriter that may be used to build the document programmatically.
        Returns:
        a newly constructed BuildingStreamWriter, which implements the XMLStreamWriter interface. If schema validation has been requested for this DocumentBuilder, then the document constructed using the XMLStreamWriter will be validated as it is written.

        If the stream of events supplied to the XMLStreamWriter does not constitute a well formed (and namespace-well-formed) document, the effect is undefined; Saxon may fail to detect the error, and construct an unusable tree.

        Throws:
        SaxonApiException - if any failure occurs
        Since:
        9.3
      • wrap

        public XdmNode wrap​(java.lang.Object node)
                     throws java.lang.IllegalArgumentException
        Create a node by wrapping a recognized external node from a supported object model.

        If the supplied object implements the NodeInfo interface then it will be wrapped as an XdmNode without copying and without change. The NodeInfo must have been created using a Configuration compatible with the one used by this Processor (specifically, one that uses the same NamePool)

        To wrap nodes from other object models, such as DOM, the support module for the external object model must be on the class path and registered with the Saxon configuration. The support modules for DOM, JDOM, DOM4J and XOM are registered automatically if they can be found on the classpath.

        It is best to avoid calling this method repeatedly to wrap different nodes in the same document. Each such wrapper conceptually creates a new XDM tree instance with its own identity. Although the memory is shared, operations that rely on node identity might not have the expected result. It is best to create a single wrapper for the document node, and then to navigate to the other nodes in the tree using S9API interfaces.

        Parameters:
        node - the node in the external tree representation. Either an instance of NodeInfo, or an instances of a node in an external object model. Nodes in other object models (such as DOM, JDOM, etc) are recognized only if the support module for the external object model is known to the Configuration.
        Returns:
        the supplied node wrapped as an XdmNode
        Throws:
        java.lang.IllegalArgumentException - if the type of object supplied is not recognized. This may be because node was created using a different Saxon Processor, or because the required code for the external object model is not on the class path
      • parse

        public void parse​(javax.xml.transform.Source source,
                          Destination destination)
                   throws SaxonApiException
        Parse a source document, sending it to a supplied Destination

        The process is streamed; no tree is constructed in memory.

        Parameters:
        source - The source document to be parsed
        destination - The destination to which the document is to be sent
        Throws:
        SaxonApiException - if parsing fails, or if the destination reports an error
      • parse

        public void parse​(java.io.File file,
                          Destination destination)
                   throws SaxonApiException
        Parse a source document from a File, sending it to a supplied Destination

        The process is streamed; no tree is constructed in memory.

        Parameters:
        file - The file containing the XML source document to be parsed
        destination - The destination to which the document is to be sent
        Throws:
        SaxonApiException - if parsing fails, or if the destination reports an error