Optimizations and performance improvements

The strategy for lazy evaluation of variables has changed. In the past, Saxon made a compile-time decision whether to evaluate variables eagerly or lazily. The problem with this is that it's hard to get the decision right: lazy evaluation imposes a significant overhead (it has to save a copy of the evaluation context) which is not always justified. So Saxon 12 now uses a dynamic learning approach: if lazy evaluation of a variable, after the first few dozen attempts, looks as if it is giving no benefit, future evaluations of the same variable will be done eagerly. This can make significant differences to the execution time of a query or stylesheet, and also to its allocation of heap memory and hence garbage collection costs.

The evaluation of filter expressions with more than one predicate has changed. In some cases, predicates can be reordered to allow more efficient evaluation, taking advantage of indexes. For example, given the expression //item[contains(@description, "cheese")][@useByDate="2022-12-01"], the evaluation (in Saxon-EE only) might be rearranged to use an index on the value of @useByDate. The problem is that this can sometimes trigger dynamic errors that the code is written to prevent: consider //item[@code castable as xs:integer][xs:integer(@code)=4]. While this rewrite is explicitly permitted in XPath 3.0, it is recognized that it causes problems, so the rules have changed in the draft 4.0 specification: it is no longer permitted to rearrange the predicates if this might trigger a dynamic error. Saxon 12 implements the new rules. It will still change the order of evaluation of the predicates where appropriate, but if the second predicate throws an error, it will evaluate the first predicate and mask the error if the first predicate is false.

The same logic applies to and and or expressions. The effect is that although the operands may be evaluated in any order, an error in evaluating one operand will never be propagated if the other operand is false (in the case of and), or true (in the case of or).

Bytecode generation is dropped from SaxonJ. Over time, as the JVM JIT compiler has improved, the benefits obtained from bytecode generation have been steadily diminishing, to the point where it is no longer worth maintaining the code. Internal changes in 12.0 to improve the interpreted code have further reduced any advantage obtained from bytecode generation, to the point where the majority of workloads gain no benefit at all. In addition, bytecode generation is not applicable for the newer platforms (SaxonCS is now generated from C# source code, while SaxonC uses the ahead-of-time code generation capabilities of GraalVM).

Internally, a number of code paths have been changed to avoid use of Class.newInstance(), which is deprecated since Java 9, and which causes operational difficulties under GraalVM. For example, system functions were previously registered with a Class<? extends SystemFunction> object such as Replace.class, and were instantiated using newInstance(); they are now registered as a Supplier<? extends SystemFunction>, with a lambda function of the form () -> new Replace(), and are instantiated by invoking this factory method. (A consequence is that the same class can now implement several closely-related functions, such as fn:true() and fn:false(), or fn:exists() and fn:empty().)

There are significant changes in the implementation of XDM arrays: