Optimizations and performance improvements

Arrays have been re-implemented to use a "persistent" immutable data structure (based on the PCollections Java library). This means that operations such as array:put() no longer make a copy of the entire array.

As an example of the performance savings, using array:put() to modify every item of an array takes the following times in microseconds:

No of members Saxon 9.8 (microsecs) Saxon 9.9 (microsecs)
100 10 24
1000 1149 475
10000 113531 6680

Type checking has been improved: the main impact is to give better diagnostics, but there will also be cases where performance is improved. The main changes are:

The lookup operator "?" has been re-implemented as a native expression (previously it was compiled into a complex expression involving loops and conditionals). As a result it should now be faster, but more particularly, it now benefits from improved type checking, and any run-time type errors are now much clearer because they relate to the expression as written, and not to some internally-generated code.

The map:merge() function has been re-implemented to avoid quadratic performance when a large number of maps containing duplicate keys are combined using the duplicates:combine option.

In Saxon-EE, regular expressions used in functions such as matches(), replace(), tokenize(), and analyze-string() are now cached so that when the same regular expression is used repeatedly, the compiled form of the regex is reused. (This is only relevant where the regular expression is not known statically: regular expressions that are known at query/stylesheet compile time are always compiled at compile time.) The cache is held at the level of a Configuration. The optimization can be suppressed using opt:-e.

Saxon by default checks axis steps at compile time to ensure they are capable of returning a non-empty result. "Void" steps such as comment()/* result in a compile-time warning. This check appears to be quite expensive in some use cases, so an optimization flag (opt:-d) has been introduced to suppress it.

A new optimization has been introduced to reduce the cost of executing simple path expressions like books/book/price. Specifically, if the right-hand operand of "/" is a simple axis step, Saxon now evaluates it without creating a new dynamic context object, and without tracking the context item and position, because they are never needed in this situation. This appears to give a speed-up of around 10% for such path expressions.

Changes to the TinyTree data structure

Two significant changes have been implemented for the TinyTree: