System Programming Interfaces

Sequence and SequenceIterator

The class Sequence, and its many subclasses such as GroundedValue and Item, were changed in Saxon 9.9 to use Java Generics; the same change was also made to SequenceIterator and its many subclasses. This change has been reverted in Saxon 10. The reason for this is basically that the change added a lot of complexity and delivered no benefits: an analysis of the motivation can be found in a Saxon blog article.

NodeInfo

The way in which namespaces are represented in the NodeInfo interface has changed, to be closer to the XDM model. Instead of getDeclaredNamespaces(), which returned the namespace information as a set of declarations and undeclarations relative to the parent element, the interface now has getAllNamespaces(), which returns all the in-scope namespaces as a NamespaceMap object. This brings the API much closer to the way that the XDM data model is defined, which ultimately makes it easier to use. The potential inefficiency of having namespaces defined redundantly on every single element node is handled by sharing NamespaceMap objects (so two elements with the same in-scope namespaces share the same NamespaceMap). The NamespaceMap object is immutable, which makes such sharing easy to manage.

The native tree models in Saxon (the TinyTree and LinkedTree) implement this by maintaining stored NamespaceMap objects within the internal data structure. The tree builders ensure that these objects are shared, so that a child element with no local namespace declarations will point to the same NamespaceMap object as its parent element. In many documents, namespaces are all declared on the root element, so there will only be one NamespaceMap object for the whole document.
The external tree models (DOM, XOM, JDOM2, etc.) generally cache the NamespaceMap for an element, so computing the in-scope namespaces for an element involves getting the cached NamespaceMap for the parent element, and then making any changes needed if there are local namespace declarations or undeclarations present on the child element.

To make it easier to iterate over the children of a node, the NodeInfo interface now has a method children() returning Iterable<NodeInfo>. The method has a default implementation. This makes it possible to use a Java "for each":

for (NodeInfo child : children()) { ... }

There is also a variant of children() that takes a Java Predicate as its second argument. The NodeTest class now implements Predicate<NodeInfo>, so you can write for example:

for (NodeInfo childElement : children(NodeKindTest.ELEMENT)) { ... }

which processes the children of a node that are themselves element nodes.

Similarly, there is a new method attributes() which returns an AttributeMap. This class replaces the old AttributeCollection throughout the product. The most notable differences are (a) an AttributeMap is immutable, and (b) its primary access is as an Iterable, rather than using integer subscripts to index into it. There are different implementations of AttributeMap depending on the number of attributes; for larger attribute sets, a hash trie structure is used to give faster access by name, and to enable addition of attributes without copying the whole structure.

The method NodeInfo.iterateAxis(axisNumber, condition) has been generalized to take a Java Predicate as its second argument.

Receiver

The Receiver interface has been forked into two interfaces: Receiver and Outputter.

The new Receiver interface receives information about the start of an element in a single call containing information about the element name and type, the in-scope namespaces of the element, and the attributes of the element. This interface is used in the validation pipeline and the serialization pipeline, where it simplifies many of the operations performed by these complex pipelines.
The Outputter interface splits the start-of-element information across a sequence of calls: startElement() for the element name and type, one call on namespace() for each namespace declaration, one on attribute() for each attribute, and finally a call on startContent() (which in some cases may be implicit). The Outputter interface is used primarily to capture the result of push-mode XSLT and XQuery instructions, notably instructions that generate elements and attributes. It's necessary in this case to be able to capture attributes and namespaces as independent events because instructions can generate them as separate events. Most commonly these instructions feed directly into a ComplexContentOutputter, which amalgamates start tag events into a single startElement() call which is then sent to a Receiver pipeline.

DateTimeValue

DateTimeValue and related classes such as DateValue and TimeValue provide additional constructors and getter methods supporting conversion to or from: java.time.Instant, java.time.LocalDateTime, java.time.OffsetDateTime, java.time.ZonedDateTime, and java.time.LocalDate.

Conversion of DateTimeValue to/from classes in the java.time package now retains nano-second precision (previously, it only retained micro-second precision).

Miscellaneous

For push-mode instruction execution, the current receiver/outputter is no longer held in the XPathContext object. Instead it is passed to the relevant methods (such as Expression.process()) as an explicit parameter. This reduces the need to create new XPathContext objects.

The PullProvider interface has changed to use a type-safe enum class for event types rather than integer constants.

The SequenceIterator interface has changed: the getProperties() method now returns a type-safe EnumSet rather than an integer with bit-significant flags.

Saxon has generally moved to the use of EQName notation (Q{uri}local) in preference to Clark names ({uri}local) for internal use. Clark names are still used where mandated by the JAXP specification, and in other stable APIs. In lower-level system programming interfaces, however, applications may notice the change. For example the change affects the representation of output properties such as method and cdata-section-elements, which are exposed to user-written serialization methods and customised serializers.

Dropped interfaces

The method StaticQueryContext.setExternalNamespaceResolver() is dropped (it has been present in the product for many years but appears to be untested, and its intended behavior is unclear).

Some long-deprecated classes and methods have been dropped, including those listed below. The general principle is that methods deprecated since 9.7 or earlier have all been dropped; methods first deprecated in 9.8 have been dropped if they were producing no useful effect in 9.9 (for example, setter methods that did nothing).

StaticQueryContext.buildDocument(), deprecated since Saxon 9.2 (use a s9api DocumentBuilder)
Configuration.buildDocument(Source) and Configuration.buildDocument(Source, ParseOptions), deprecated since Saxon 9.7 (use a s9api DocumentBuilder, or Configuration.buildDocumentTree())
net.sf.saxon.om.DocumentInfo, since 9.7 used only by deprecated methods
AbstractStaticContext.declareCollation(), StaticQueryContext.declareCollation(), and StaticContext.getCollation(), deprecated since 9.6 (collations should be now registered at the level of the Configuration)
XsltTransformer.setInitialContextNode(), deprecated since 9.7 (use setGlobalContextItem())
Xslt30Transformer.setInitialContextItem(), deprecated since 9.8 (use setGlobalContextItem())
XQueryExpression.pull(), deprecated since 9.8 (use run())
ParseOptions.setStripSpace(), AugmentedSource.setStripSpace(), ParseOptions.getStripSpace() and AugmentedSource.getStripSpace(), deprecated since 9.8 (use setSpaceStrippingRule())
XsltCompiler.compilePackages() and addCompilePackages(), deprecated since 9.8 (use a PackageLibrary)
BuildingStreamWriter.setInventPrefixes() and isInventPrefixes(), deprecated since 9.7. The method previously had no effect, so the call can be dropped. (Note, the method was previously marked as deprecated in the implementation class, but not in the interface.)
CollectionURIResolver and associated methods: deprecated since 9.7. (Use a CollectionFinder. If necessary, wrap the old CollectionURIResolver in a CollectionURIResolverWrapper, whose source code can be found in Saxon 9.9.) This change also causes Feature.COLLECTION_URI_RESOLVER and Feature.COLLECTION_URI_RESOLVER_CLASS to be dropped, as well as the -cr option on the Transform and Query command line.
CompilerInfo.setExtensionFunctionLibrary() and getExtensionFunctionLibrary(): deprecated since 9.7 (use other mechanisms, such as packages)

The methods setRecoveryPolicy and getRecoveryPolicy have been dropped from the Configuration and CompilerInfo classes, along with the configuration feature Feature.RECOVERY_POLICY. In recent releases the only remaining effect of this switch was to provide a default for the xsl:mode attributes on-multiple-match="fail" and warning-on-multiple-match="true"; this must now be controlled via the xsl:mode declaration in the stylesheet.

Tracing

The mechanism supporting execution tracing (including the -T and -TP options on the command line) has been substantially revamped, involving some changes to interfaces:

If a CodeInjector is nominated, it now operates as a pass over the fully-constructed expression tree, rather than being invoked by the XSLT and XPath parsers as expressions and instructions are created. This means that it now visits every node in the expression tree, not only selected nodes as before. Of course a CodeInjector can always ignore nodes that it is not interested in. The new mechanism opens the way to much more variety in the kind of instrumentation tasks that the CodeInjector can undertake.
The object passed to the TraceListener at run-time is now a Traceable, replacing the old InstructionInfo. A Traceable is either an expression, or a component such as a UserFunction, GlobalVariable, or TemplateRule. The InstructionInfo class was something of a rag-bag, and duplicated a lot of information that in the last few releases has been mantained in the RetainedStaticContext and Location objects attached to the expression itself.
This has enabled the dropping of a number of auxiliary classes such as InstructionDetails and LocationKind.
The way in which expressions are identified by the StandardErrorListener is now better aligned with the way they are identified by the TraceListener.
In the CodeInjector and TraceListener interfaces, default implementations of methods are now provided; among other things, this eliminates the need for the TraceListener2 interface.

ErrorListener

The StandardErrorListener and StandardInvalidityHandler have been refactored internally (a) to make better reuse of common code, and (b) to make it easier to write customised subclasses to achieve tailored diagnostic output. To this end, some static methods have been replaced by non-static methods, and large methods have been split into smaller pieces, documented to allow them to be overridden. A consequence of these changes is that existing user-written code subclassing these two classes may need changing. If this is onerous, one way of getting around the problem is to copy the open-source implementations of these methods from the Saxon 9.9 distribution, and rename this as a user-defined class.

A new configuration option RETAIN_NODE_FOR_DIAGNOSTICS is provided. If set, this causes the location information associated with instructions and expressions in an XSLT stylesheet to retain a reference to the node in the stylesheet tree where the error occurred (rather than simply retaining the node name and line number). This information can be useful to IDEs that wish to highlight where in the source the error occurred. The information is retained both for static and dynamic error reporting. The option is off by default because retaining the links prevents the temporary data used during compilation being garbage-collected. Previous releases attempted to retain the information for static errors but not for dynamic errors; it is now retained for both, but only if this option is set.