Document Projection

Document Projection is a mechanism that analyzes a query to determine what parts of a document it can potentially access, and then while building a tree to represent the document, leaves out those parts of the tree that cannot make any difference to the result of the query.

Document projection can be enabled as an option on the XQuery command line interface: set -projection:on. It is only used if requested. The command line option affects both the primary source document supplied on the command line, and any calls on the doc() function within the body of the query that use a literal string argument for the document URI.

For feedback on the impact of document projection in terms of reducing the size of the source document in memory, use the -t option on the command line, which shows for each document loaded how many nodes from the input document were retained and how many discarded.

From the s9api API, document projection can be invoked as an option on the DocumentBuilder. The call setDocumentProjectionQuery() supplies as its argument a compiled query (an XQueryExecutable), and the document built by the document builder is then projected to retain only the parts of the document that are accessed by this query, when it operates on this document as the initial context item. For example, if the supplied query is count(//ITEM), then only the ITEM elements will be retained.

It is also possible to request that a query should perform document projection on documents that it reads using the doc() function, provided this has a string-literal argument. This can be requested using the option setAllowDocumentProjection(true) on the XQueryExpression object. This is not available directly in the s9api interface, but the XQueryExpression is reachable from the XQueryExecutable using the accessor method getUnderlyingCompiledQuery().

The more complex the query, the less likely it is that Saxon will be able to analyze it to determine the subset of the document required. If precise analysis is not possible, document projection has no effect. Currently Saxon makes no attempt to analyze accesses made within user-defined functions. Also, of course, Saxon cannot analyze the expectations of external (Java) functions called from the query.

Currently document projection is supported only for XQuery, and it works only when a document is parsed and loaded for the purpose of executing a single query. It is possible, however, to use the mechanism to create a manual filter for source documents if the required subset of the document is known. To achieve this, create a query that selects the required parts of the document supplied as the context item, and compile it to a s9api XQueryExecutable. The query does not have to do anything useful: the only requirement is that the result of the query on the subset document must be the same as the result on the original document. Then supply this XQueryExecutable to the s9api DocumentBuilder used to build the document.

Of course, when document projection is used manually like this then it is entirely a user responsibility to ensure that the selected part of the document contains all the nodes required.