Optimizations and performance improvements

Some optimizations that were previously done in Saxon-HE have been moved into Saxon-EE. The aim is to clarify and simplify the differentiation of the products, and the change compensates for the move of some functionality (such as higher order functions) into Saxon-HE.

Certain optimizations are now placed under a new optimization category n (constant folding), and can therefore be enabled or suppressed by setting this optimization option. The category includes compile-time evaluation of simple expressions such as arithmetic expressions (2+2) or function calls (boolean(1)), and compile-time evaluation of function calls and other expressions where it is known what the result will be if one of the operands is empty (2+()).

The amount of memory needed for the indexes used to support xsl:key has been reduced, particularly in the common cases (a) where the keys are strings compared using codepoint collation, and (b) where the key values are unique.

A new CompareToStringConstant expression has been introduced for value comparisons of the form EXPR eq "literal" (with any of the permitted operators) using Unicode codepoint collation.

The xsl:where-populated instruction now has a push-mode implementation. (The push-mode implementation already existed in principle, but it was incomplete, and was used only when streaming.)

A number of functions that are capable of returning large strings (concat, string-join, unparsed-text, xml-to-json) are now able in appropriate cases to stream their output incrementally to the serializer rather than materializing the result as a string in memory.

Match patterns of the form match="$nodeset" where $nodeset is a global variable are now optimized (Saxon-EE only): the variable is stored in an indexed form that allows a quick lookup of whether a particular node is present, without a serial search of the entire node-set.

A new implementation of XDM maps, the DictionaryMap, is used in cases where all the keys will be simple strings, and further changes to the map are considered unlikely. For example, this is used to represent a map constructed by fn:parse-json(). Optimizing for string-valued keys saves space, and makes lookup faster, while optimizing for immutability enables the standard Java HashMap structure to be used internally. If map:put() or map:remove() are used on such a map, it is first copied in its entirety to a new HashTrie using the existing data structure. The DictionaryMap is internally mutable during its construction phase, so adding entries incrementally during the course of an operation such as fn:parse-json() is much more efficient than if a pure immutable structure were used.

Node tests of the form *:NAME (and predicates of the form *[local-name() = 'NAME']), when executed against a TinyTree, are now more efficient: the first time such a test is used against a particular tree, an index is built mapping local names present in the tree to integer fingerprints, and the search then compares fingerprints rather than looking up each node name in the name pool and comparing strings.

Sorting is now done using the standard sort routines in the JDK, rather than the third-party library that was previously used. In some cases this has been found to give a substantial performance boost (for example, in the common case where a node sequence is sorted into document order, and the input is already sorted or nearly sorted).

At various stages in the 9.8 and 9.9 releases, attempts were made to introduce an optimization for copying of subtrees. One approach was "bulk copy", which attempted to copy part of a tinytree into another tinytree by combining the low-level data structures, rather than by node-by-node copying at the logical level. Another attempt was "grafting", whereby a subtree could by physically shared between two or more trees, while representing distinct nodes at the logical level. Both attempts have now been abandoned. Although they gave good performance results, there were so many restrictions that they weren't invoked very often; and the implementation had so many complexities (primarily related to namespace handling) that it proved near impossible to get all the edge cases correct.