net.sf.saxon.s9api
Class DocumentBuilder

java.lang.Object
  extended by net.sf.saxon.s9api.DocumentBuilder

public class DocumentBuilder
extends Object

A document builder holds properties controlling how a Saxon document tree should be built, and provides methods to invoke the tree construction.

This class has no public constructor. Users should construct a DocumentBuilder by calling the factory method Processor.newDocumentBuilder().

All documents used in a single Saxon query, transformation, or validation episode must be built with the same Configuration. However, there is no requirement that they should use the same DocumentBuilder.

Sharing of a DocumentBuilder across multiple threads is not recommended. However, in the current implementation sharing a DocumentBuilder (once initialized) will only cause problems if a SchemaValidator is used.

Since:
9.0

Constructor Summary
protected DocumentBuilder(Configuration config)
          Create a DocumentBuilder.
 
Method Summary
 XdmNode build(File file)
          Build a document from a supplied XML file
 XdmNode build(Source source)
          Load an XML document, to create a tree representation of the document in memory.
 URI getBaseURI()
          Get the base URI of documents loaded using this DocumentBuilder when no other URI is available.
 XQueryExecutable getDocumentProjectionQuery(XQueryExpression query)
          Get the compiled query to be used for implementing document projection.
 SchemaValidator getSchemaValidator()
          Get the SchemaValidator used to validate documents loaded using this DocumentBuilder.
 TreeModel getTreeModel()
          Get the tree model to be used for documents constructed using this DocumentBuilder.
 WhitespaceStrippingPolicy getWhitespaceStrippingPolicy()
          Get the white whitespace stripping policy applied when loading a document using this DocumentBuilder.
 boolean isDTDValidation()
          Ask whether DTD validation is to be applied to documents loaded using this DocumentBuilder
 boolean isLineNumbering()
          Ask whether line numbering is enabled for documents loaded using this DocumentBuilder.
 boolean isRetainPSVI()
          Ask whether the constructed tree should contain information derived from schema validation, specifically whether it should contain type annotations and expanded defaults of missing element and attribute content.
 BuildingContentHandler newBuildingContentHandler()
          Get an ContentHandler that may be used to build the document programmatically.
 BuildingStreamWriter newBuildingStreamWriter()
          Get an XMLStreamWriter that may be used to build the document programmatically.
 void setBaseURI(URI uri)
          Set the base URI of a document loaded using this DocumentBuilder.
 void setDocumentProjectionQuery(XQueryExecutable query)
          Set a compiled query to be used for implementing document projection.
 void setDTDValidation(boolean option)
          Set whether DTD validation should be applied to documents loaded using this DocumentBuilder.
 void setLineNumbering(boolean option)
          Say whether line numbering is to be enabled for documents constructed using this DocumentBuilder.
 void setRetainPSVI(boolean retainPSVI)
          Set whether the constructed tree should contain information derived from schema validation, specifically whether it should contain type annotations and expanded defaults of missing element and attribute content.
 void setSchemaValidator(SchemaValidator validator)
          Set the schemaValidator to be used.
 void setTreeModel(TreeModel model)
          Set the tree model to be used for documents constructed using this DocumentBuilder.
 void setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy policy)
          Set the whitespace stripping policy applied when loading a document using this DocumentBuilder.
 XdmNode wrap(Object node)
          Create a node by wrapping a recognized external node from a supported object model.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DocumentBuilder

protected DocumentBuilder(Configuration config)
Create a DocumentBuilder. This is a protected constructor. Users should construct a DocumentBuilder by calling the factory method Processor.newDocumentBuilder().

Parameters:
config - the Saxon configuration
Method Detail

setTreeModel

public void setTreeModel(TreeModel model)
Set the tree model to be used for documents constructed using this DocumentBuilder. By default, the TinyTree is used.

Parameters:
model - typically one of the constants TreeModel.TINY_TREE, TreeModel.TINY_TREE_CONDENSED, or TreeModel.LINKED_TREE. It can also be an external object model such as XOMObjectModel
Since:
9.2

getTreeModel

public TreeModel getTreeModel()
Get the tree model to be used for documents constructed using this DocumentBuilder. By default, the TinyTree is used.

Returns:
the tree model in use: typically one of the constants TreeModel.TINY_TREE, TreeModel.TINY_TREE_CONDENSED, or TreeModel.LINKED_TREE. However, in principle a user-defined tree model can be used.
Since:
9.2

setLineNumbering

public void setLineNumbering(boolean option)
Say whether line numbering is to be enabled for documents constructed using this DocumentBuilder. This has the effect that the line number in the original source document is maintained in the constructed tree, for each element node (and only for elements). The line number in question is generally the line number on which the closing ">" of the element start tag appears.

By default, line numbers are not maintained.

Errors relating to document parsing and validation will generally contain line numbers whether or not this option is set, because such errors are detected during document construction.

Line numbering is not available for all kinds of source: for example, it is not available when loading from an existing DOM Document.

The resulting line numbers are accessible to applications using the XPath extension function saxon:line-number() applied to a node, or using the Java method NodeInfo.getLineNumber()

Line numbers are maintained only for element nodes; the line number returned for any other node will be that of the most recent element. For an element node, the line number is generally that of the closing angle bracket at the end of the start tag (this is what a SAX parser notifies)

Parameters:
option - true if line numbers are to be maintained, false otherwise.

isLineNumbering

public boolean isLineNumbering()
Ask whether line numbering is enabled for documents loaded using this DocumentBuilder.

By default, line numbering is disabled.

Line numbering is not available for all kinds of source: in particular, it is not available when loading from an existing XmlDocument.

The resulting line numbers are accessible to applications using the extension function saxon:line-number() applied to a node, or using the Java method NodeInfo.getLineNumber()

Line numbers are maintained only for element nodes; the line number returned for any other node will be that of the most recent element. For an element node, the line number is generally that of the closing angle bracket at the end of the start tag (this is what a SAX parser notifies)

Returns:
true if line numbering is enabled

setSchemaValidator

public void setSchemaValidator(SchemaValidator validator)
Set the schemaValidator to be used. This determines whether schema validation is applied to an input document and whether type annotations in a supplied document are retained. If no schemaValidator is supplied, then schema validation does not take place.

This option requires the schema-aware version of the Saxon product (Saxon-EE).

Since a SchemaValidator is serially reusable but not thread-safe, using this method is not appropriate when the DocumentBuilder is shared between threads.

Parameters:
validator - the SchemaValidator to be used

getSchemaValidator

public SchemaValidator getSchemaValidator()
Get the SchemaValidator used to validate documents loaded using this DocumentBuilder.

Returns:
the SchemaValidator if one has been set; otherwise null.

setRetainPSVI

public void setRetainPSVI(boolean retainPSVI)
Set whether the constructed tree should contain information derived from schema validation, specifically whether it should contain type annotations and expanded defaults of missing element and attribute content. If no schema validator is set then this option has no effect. The default value is true.

Not yet implemented.

Parameters:
retainPSVI - if true, the constructed tree will contain type annotations and expanded defaults of missing element and attribute content. If false, the tree that is returned will be the same as if schema validation did not take place (except that if the document is invalid, no tree will be constructed)

isRetainPSVI

public boolean isRetainPSVI()
Ask whether the constructed tree should contain information derived from schema validation, specifically whether it should contain type annotations and expanded defaults of missing element and attribute content. If no schema validator is set then this option has no effect.

Not yet implemented.

Returns:
true, if the constructed tree will contain type annotations and expanded defaults of missing element and attribute content. Return false, if the tree that is returned will be the same as if schema validation did not take place (except that if the document is invalid, no tree will be constructed)

setDTDValidation

public void setDTDValidation(boolean option)
Set whether DTD validation should be applied to documents loaded using this DocumentBuilder.

By default, no DTD validation takes place.

Parameters:
option - true if DTD validation is to be applied to the document

isDTDValidation

public boolean isDTDValidation()
Ask whether DTD validation is to be applied to documents loaded using this DocumentBuilder

Returns:
true if DTD validation is to be applied

setWhitespaceStrippingPolicy

public void setWhitespaceStrippingPolicy(WhitespaceStrippingPolicy policy)
Set the whitespace stripping policy applied when loading a document using this DocumentBuilder.

By default, whitespace text nodes appearing in element-only content are stripped, and all other whitespace text nodes are retained.

Parameters:
policy - the policy for stripping whitespace-only text nodes from source documents

getWhitespaceStrippingPolicy

public WhitespaceStrippingPolicy getWhitespaceStrippingPolicy()
Get the white whitespace stripping policy applied when loading a document using this DocumentBuilder.

Returns:
the policy for stripping whitespace-only text nodes

setBaseURI

public void setBaseURI(URI uri)
Set the base URI of a document loaded using this DocumentBuilder.

This is used for resolving any relative URIs appearing within the document, for example in references to DTDs and external entities.

This information is required when the document is loaded from a source that does not provide an intrinsic URI, notably when loading from a Stream or a DOMSource. The value is ignored when loading from a source that does have an intrinsic base URI.

Parameters:
uri - the base URI of documents loaded using this DocumentBuilder. This must be an absolute URI.
Throws:
IllegalArgumentException - if the baseURI supplied is not an absolute URI

getBaseURI

public URI getBaseURI()
Get the base URI of documents loaded using this DocumentBuilder when no other URI is available.

Returns:
the base URI to be used, or null if no value has been set.

setDocumentProjectionQuery

public void setDocumentProjectionQuery(XQueryExecutable query)
Set a compiled query to be used for implementing document projection. The effect of using this option is that the tree constructed by the DocumentBuilder contains only those parts of the source document that are needed to answer this query. Running this query against the projected document should give the same results as against the raw document, but the projected document typically occupies significantly less memory. It is permissible to run other queries against the projected document, but unless they are carefully chosen, they will give the wrong answer, because the document being used is different from the original.

The query should be written to use the projected document as its initial context item. For example, if the query is //ITEM[COLOR='blue'), then only ITEM elements and their COLOR children will be retained in the projected document.

This facility is only available in Saxon-EE; if the facility is not available, calling this method has no effect.

Parameters:
query - the compiled query used to control document projection
Since:
9.3

getDocumentProjectionQuery

public XQueryExecutable getDocumentProjectionQuery(XQueryExpression query)
Get the compiled query to be used for implementing document projection.

Returns:
query the query set using setDocumentProjectionQuery(net.sf.saxon.s9api.XQueryExecutable) if this has been called, or null otherwise
Since:
9.3

build

public XdmNode build(Source source)
              throws SaxonApiException
Load an XML document, to create a tree representation of the document in memory.

Parameters:
source - A JAXP Source object identifying the source of the document. This can always be a StreamSource or a SAXSource. Some kinds of Source are consumed by this method, and should only be used once.

If a SAXSource is supplied, the XMLReader held within the SAXSource may be modified (by setting features and properties) to reflect the options selected on this DocumentBuilder.

An instance of DOMSource is accepted provided that the Saxon support code for DOM (in saxon9-dom.jar) is on the classpath.

If the source is an instance of NodeInfo then the subtree rooted at this node will be copied (applying schema validation if requested) to create a new tree.

Saxon also accepts an instance of StAXSource or PullSource, which can be used to supply a document that is to be parsed using a StAX parser.

(9.2) This method no longer accepts an instance of AugmentedSource, because of confusion over interactions between the properties of the AugmentedSource and the properties of this DocumentBuilder.

Returns:
An XdmNode. This will be the document node at the root of the tree of the resulting in-memory document.
Throws:
NullPointerException - if the source argument is null
IllegalArgumentException - if the kind of source is not recognized
SaxonApiException

build

public XdmNode build(File file)
              throws SaxonApiException
Build a document from a supplied XML file

Parameters:
file - the supplied file
Returns:
the XdmNode representing the root of the document tree
Throws:
SaxonApiException - if any failure occurs retrieving or parsing the document

newBuildingContentHandler

public BuildingContentHandler newBuildingContentHandler()
                                                 throws SaxonApiException
Get an ContentHandler that may be used to build the document programmatically.

Returns:
a newly constructed BuildingContentHandler, which implements the ContentHandler interface. If schema validation has been requested for this DocumentBuilder, then the document constructed using the ContentHandler will be validated as it is written.

Note that the returned ContentHandler expects namespace scopes to be indicated explicitly by calls to ContentHandler.startPrefixMapping(java.lang.String, java.lang.String) and ContentHandler.endPrefixMapping(java.lang.String).

If the stream of events supplied to the ContentHandler does not constitute a well formed (and namespace-well-formed) document, the effect is undefined; Saxon may fail to detect the error, and construct an unusable tree.

Throws:
SaxonApiException
Since:
9.3

newBuildingStreamWriter

public BuildingStreamWriter newBuildingStreamWriter()
                                             throws SaxonApiException
Get an XMLStreamWriter that may be used to build the document programmatically.

Returns:
a newly constructed BuildingStreamWriter, which implements the XMLStreamWriter interface. If schema validation has been requested for this DocumentBuilder, then the document constructed using the XMLStreamWriter will be validated as it is written.

If the stream of events supplied to the XMLStreamWriter does not constitute a well formed (and namespace-well-formed) document, the effect is undefined; Saxon may fail to detect the error, and construct an unusable tree.

Throws:
SaxonApiException
Since:
9.3

wrap

public XdmNode wrap(Object node)
             throws IllegalArgumentException
Create a node by wrapping a recognized external node from a supported object model.

If the supplied object implements the NodeInfo interface then it will be wrapped as an XdmNode without copying and without change. The NodeInfo must have been created using a Configuration compatible with the one used by this Processor (specifically, one that uses the same NamePool)

To wrap nodes from other object models, such as DOM, the support module for the external object model must be on the class path and registered with the Saxon configuration. The support modules for DOM, JDOM, DOM4J and XOM are registered automatically if they can be found on the classpath.

It is best to avoid calling this method repeatedly to wrap different nodes in the same document. Each such wrapper conceptually creates a new XDM tree instance with its own identity. Although the memory is shared, operations that rely on node identity might not have the expected result. It is best to create a single wrapper for the document node, and then to navigate to the other nodes in the tree using S9API interfaces.

Parameters:
node - the node in the external tree representation. Either an instance of NodeInfo, or an instances of a node in an external object model. Nodes in other object models (such as DOM, JDOM, etc) are recognized only if the support module for the external object model is known to the Configuration.
Returns:
the supplied node wrapped as an XdmNode
Throws:
IllegalArgumentException - if the type of object supplied is not recognized. This may be because node was created using a different Saxon Processor, or because the required code for the external object model is not on the class path


Copyright (c) 2004-2010 Saxonica Limited. All rights reserved.