Optimizations and performance improvements

Saxon-EE 9.8 introduces hot-spot bytecode generation. In the past, bytecode generation was done unconditionally, unless disabled entirely. Since the costs of bytecode generation can exceed the benefits unless the code is executed frequently enough to amortize the cost, this meant that the faster run time was sometimes insufficient to compensate for the increase in compile time. In Saxon-EE 9.8, by default, sections of XSLT or XQuery code are initially interpreted, with bytecode being generated only after the code has been executed a set number of times. By default the threshold is 300 executions; this can be changed by setting the configuration property GENERATE_BYTE_CODE to an integer value. (Boolean values for this property are still supported, with "true" setting a threshold of 0, and "false" setting a threshold of infinity.)

In addition, bytecode generation is now more selective. In the past, when a construct was used for which bytecode generation was not implemented, the entire function or template was interpreted. Now, only the construct in question (and its operands, recursively) is interpreted. This means we can focus effort on generating bytecode for those constructs where it genuinely adds value; there is also a memory saving because generating bytecode for everything can consume a lot of memory (which can particularly be a problem because memory used for loaded classes is not a high priority for garbage collection).

In XSLT, Saxon-EE 9.8 further reduces compile time by using just-in-time compilation of template rules. During stylesheet static analysis, the match patterns of all template rules in a mode are analyzed in the usual way and are used to build a data structure supporting fast determination of the best-matching template rule for any node. The actual body of the template rule, however, is not compiled until the first time a particular rule fires. This can substantially improve stylesheet compilation time for very large stylesheets (examples being the DocBook and DITA stylesheets) which often contain thousands of template rules that are never used in a particular transformation run, because source documents typically only use a small part of the DocBook or DITA vocabulary.

A consequence of this change is that static errors in the stylesheet may go undetected if the template rule in question is not exercised. During stylesheet development, to ensure that all static errors are detected, users are therefore advised to disable this option. This can be done on the command line using the option -opt:-j, or by setting the configuration property OPTIMIZATION_LEVEL to "-j".

Just-in-time compilation is automatically disabled if a stylesheet is compiled for export using the -export option.

A new optimization is introduced in Saxon-EE: elimination of common subexpressions. Where the same expression appears more than once within a function or template, and the evaluation context is the same, and the expression does not create new nodes, and where certain other conditions are met, a local variable is now introduced and bound to the value of the expression, and the multiple occurrences of the expression are replaced with references to the variable. (Note that stylesheet authors should not rely on this optimization occurring, since not all cases will be detected.)

Saxon-EE now generates bytecode for validation of input strings against user-defined simple types. This is most useful when there is a complex type hierarchy and many facets. Validation of simple types is often the main bottleneck during schema validation.

The optimizer trace produced with the -explain option on the command line, or with the configuration property TRACE_OPTIMIZER_DECISIONS, is now more compact, and also covers a wider range of optimizations.

The code for loop-lifting and for extraction of global variables has been rewritten, making it more efficient. There may be some differences in the detailed circumstances in which expressions are extracted from loops.

The experimental code in Saxon 9.7 for optimizing searching of large sets of XSLT template rules has been rewritten and productized. The new code is designed to handle sets of rules making extensive use of similar predicates, for example match="*[XXX = 'constant'] where XXX is the same expression in every rule, and 'constant' is a different constant in every rule. The main benefit comes from only evaluating the expression XXX once for each rule in the set; in addition many matches can be quickly eliminated by comparing integer hash codes rather than the full strings.

The optimization pays particular attention to the kind of pattern found throughout the DITA-OT stylesheets, which make extensive use of predicates of the form match="*[contains(@class, ' topic/abc ']/*[contains(@class, ' topic/def ')]. These are processed using the following logic:

First, where several rules share the same parent pattern (here *[contains(@class, ' topic/abc ']), this parent pattern is only evaluated once, and if it fails to match, all rules using this qualifier are automatically eliminated from the search.
The particular call on contains() with a second argument comprising a token wrapped between space characters is recognized as one that can be efficiently evaluated by tokenizing the value of the @class attribute on whitespace boundaries. In effect, it is evaluated as tokenize(@class) = 'topic/def'; but with the extra refinements that the tokenization of the @class attribute is only done once for all rules in the set, and further that this set of tokens is converted to a set of integer hash codes which can then be compared with the integer hash code computed at compile time for the search target topic/def.

In a stylesheet export file, functions that are not explicitly referenced are no longer exported unless (a) they have non-private visibility, or (b) the stylesheet package contains an xsl:evaluate instruction or a call on fn:function-lookup(). This reduces the size of export files for stylesheets that export large function libraries (such as the FunctX library) when they only want to use one or two functions from that library.

Type-checking on maps has been redesigned. Static type inferencing has been improved, so there should be fewer cases where run-time type checking is needed. Operations that update maps with new entries keep track of the primitive type of the keys and the values, so that run-time type checks involving only primitive types should be very fast. Type checking is also fast if the keys and values are homogenous, that is, if all keys have the same type, and all items in all values have the same type. More complex type checks can take time proportional to the number of entries, so such types should be used with care.

Saxon-EE now optimizes the xsl:number instruction with level="any" by rewriting the instruction (where possible) to use an internal accumulator. The assumption is that if one node in the document is being numbered, it is very likely that many other nodes in the document will be numbered, and that it is therefore worth calculating all the numbers in a single pass. The optimization is only possible when there is a count pattern (or where a count pattern can be inferred), and where certain other conditions are satisfied: neither the count nor from patterns may contain local variable references or references to the current() function; and neither must be capable of matching attribute or namespace nodes (because these are not visited by accumulators).