Optimizations and performance improvements

For XSLT, a new optimization has been introduced for variable declarations of the form <xsl:variable> (instructions) </xsl:variable>. If the content of the variable consists entirely of literal text or <xsl:value-of> instructions, and if data-flow analysis shows that all references to the variable use its value as a string, then the variable is rewritten to compute the string value without first constructing a temporary tree. This can give dramatic performance improvements for stylesheets that make heavy use of this inefficient construct. Note however that using this construct is still bad practice. The optimization does not work for global parameters, and the data-flow analysis does not extend beyond a single function or template: so if the value of the variable is passed as a parameter to another function or template, it will still be evaluated as a temporary tree. The optimized code can be achieved in all these cases by simply adding as="xs:string" to the variable declaration.

In Saxon-SA, a construct of the form //X[./Y = $Z] is generally optimized to use an index. This optimization has been dropped (because it was incorrect) in the case where (a) the operator is "=" rather than "eq", and (b) insufficient static type information is available to decide statically the target type that untypedAtomic values need to be converted to. To ensure that this optimization takes place, (a) use "eq" rather than "=" whenever it makes sense to do so, (b) declare the types of all variables, function parameters, and results, and (c) use schema-aware processing when possible.

Calls to user-written functions are now evaluated in "push" mode if the calling expression is evaluated in push mode. This happens, for example, with the construct [<e>{f:call()}</e> in XQuery, or [<e><xsl:sequence select="f:call()"/></e> in XSLT. This is especially useful if the called function creates further elements, as these elements will then be written directly to the current tree, (or perhaps directly to the serializer) rather than being constructed as a separate subtree which is then copied.

There has been some tuning of the translate() and substring() functions, to optimize for common cases, for example where the strings contain no non-BMP characters, where the arguments to substring() are integers, or where the second and third arguments to translate are known statically. The translate() function now creates a hash map where appropriate, to avoid O(n*m) performance where n and m are the lengths of the first two arguments.

Many decisions about how to evaluate an expression (lazy or eager mode, single-item evaluation or sequence iteration, pull or push) that were previously made at run-time are now made at compile time.

More decisions about sorting are now made statically; and decisions that are made statically are now made during the type checking phase, so that more type information is available. For example, if it is known statically that the items to be sorted will all be strings, or all numbers, then an appropriate AtomicComparer is allocated at compile time, rather than using the general-purpose AtomicSortComparer which bases many decisions on the dynamic type of the items in the sequence being sorted.

The implementation of tail-call optimization for XQuery functions and XSLT stylesheet functions has been redesigned. The previous implementation was somewhat fragile, and the new approach is more robust and more efficient: it effectively does a compile-time rewrite of the body of a tail-recursive function as a loop. The new design avoids allocating new stack frame objects and context objects on each call; it also saves memory on the heap as well as on the Java stack. The new approach, however, does mean that a small amount of Java stack space is consumed when a tail call to a different function occurs: this means that where two functions are mutually tail-recursive, it is possible to run out of stack space (there is some optimization of such calls, in that the objects used to hold context information are reused; but these objects are on the Java heap rather than on the stack). It also means that the stack trace produced in Saxon-SA on a dynamic error now lists only a single call when a run-time failure occurs deep in a recursive function. The new design does not affect tail-call optimizations of xsl:apply-templates and xsl:call-template in XSLT, which uses a different mechanism.

There has been some tuning to the XML output method to reduce unnecessary building and checking of qualified element and attribute names that are used repeatedly.