Optimizations and performance improvements

A new configuration option is available to control the optimization level. This appears as the -opt option on the Query and Transform interfaces, and as OPTIMIZATION_LEVEL on APIs such as Configuration.setConfigurationProperty() and TransformerFactory.setAttribute(). The value is an integer in the range 0 (no optimization) to 10 (full optimization); currently all values other than 0 result in full optimization but this is likely to change in future. The default is full optimization; this feature allows optimization to be suppressed in cases where reducing compile time is important, or where optimization gets in the way of debugging, or causes extension functions with side-effects to behave unpredictably. (Note however, that even with no optimization, lazy evaluation may still cause the evaluation order to be not as expected.)

A function call appearing within a loop, but with no dependency on the loop variables, can now be moved out of the loop, provided the function does not create new nodes. Previously, the worst-case scenario was assumed: that the function could create new nodes, and that it therefore needed to be called repeatedly even if the arguments were unchanged. The analysis of whether the function creates new nodes is now done in all cases except for recursive functions, where the worst-case is still assumed. Note that "creates new nodes" here means "creates new nodes and returns a result that depends on the node identity". A function that creates new nodes and immediately atomizes them is not considered to be creative, and can safely be moved out of a loop. By contrast, a function whose result depends on the XSLT generate-id() function is considered creative in all cases.

The design of the LargeStringBuffer used to hold the content of text nodes in the tiny tree has changed to use fixed-length segments instead of variable-length segments. The result is that in general, locating text is faster, with the downside that more data copying is needed for unusually long text nodes. Overall, in the XMark benchmark, this shows an improvement of 5% in query execution times; occasionally 10%.

A new variant of the tiny tree data structure can be selected at user option. This is the "condensed tiny tree". While building a condensed tiny tree, the system checks before creating a text or attribute node whether there is already another node with the same string value; if so, the value is only stored once. The tree thus takes a little longer to build (perhaps 10Mb/sec rather than 15Mb/sec) but will typically occupy less memory. The saving in memory obviously depends greatly on the nature of the data. This option is selected from the Transform or Query command line using the option -tree:tinyc, or from the API using the value "tinyTreeCondensed" for the configuration option TREE_MODEL_NAME, or the value Builder.TINY_TREE_CONDENSED in various setTreeModel() interfaces.

By default the tiny tree now maintains a cache of the typed values of element and attribute nodes. The typed value is held in this cache if it is not an instance of string, untypedAtomic, or anyURI (which means that the cache is only populated for Saxon-EE). The typed value is placed in the cache the first time it is computed during the course of a query or transformation (not at the time of initial validation). If this uses excessive memory, or if it delivers no benefit for the query/transformation in question (which can happen if each element/attribute is only processed once, for example) then there is a configuration option USE_TYPED_VALUE_CACHE to disable it.

In XQuery FLWOR expressions, the rewriting of the "where" conditions into predicates applied to the individual "for" clauses is now done more vigorously. Previously it was done only for terms in the where condition that were potentially indexable, for example a value comparison; it is now done for expressions of any kind. Where the FLWOR expression is evaluated by nested looping, this can significantly reduce the number of iterations of the inner loop. The rewrite is also less likely to be prevented by the presence of references to the context item within the predicate (in most cases these can now be converted into reference to a variable declared and bound to the context item at an outer level). Finally, a predicate of the form where not(A or B or C) is now converted into where not(A) and not(B) and not(C) before this redistribution of predicate terms is attempted.

A simple set of rewrites for boolean expressions have been introduced: (A and true()) is rewritten as (A), while (A or false()) is rewritten as (A). Of course the importance of these is that they simplify the expression making it a candidate for further more powerful optimizations, such as indexing.

Global variables can now be indexed in Saxon-EE. Previously this was done only for local variables. A global variable V will be indexed if there is any filter expression of the form $V[@X = Y] where @X represents any expression whose value depends on the context node, and Y represents any expression whose value does not depend on the context node. (Variations are possible, of course: the operands can be in either order, and the operator can be "eq" rather than "=".)

When Saxon-EE extracts expressions from templates and functions into new global variables, it now ensures that if an expression appears more than once, only a single global variable is created. (This depends on the expressions being recognized as equal, which does not happen in all cases.) A particular benefit occurs with stylesheets that make heavy use of attribute sets (typically, XSL-FO stylesheets): any attribute set whose value has no context dependencies is now computed once as a global variable (its value being a sequence of attribute nodes, which are copied each time the attribute set is referenced).

The functions concat() and string-join() are now capable of operating in push mode. This means that with a query such as <a>{string-join(//a, '-')}</a>, the output of the string-join() function is streamed directly to the serializer, rather than being constructed as a string in memory. The same applies to the select expression of xsl:value-of; for example the expression <xsl:value-of select="1 to $n"/> now streams its output to the serializer without allocating memory for the potentially-large string value.

An optimization for translate(), using a hashmap rather than a serial search to map individual characters, was present in earlier releases but only activated if the second and third arguments were string literals. The optimization is now activated for run-time lookups as well, provided the product of the lengths of the first and second arguments exceeds 1000 (a threshold obtained by doing some simple measurements).

Some changes have been made to the NamePool (and the way it is used) to reduce contention. Whenever a nameCode is allocated, a corresponding namespaceCode is now allocated at the same time, which means that users of this nameCode can be confident that the namespaceCode is already in the NamePool, avoiding the need for another synchronized method call (which was often being done at run time).