System Programming Interfaces

Nodes and Fingerprints

The gradual move to reduce dependence on the NamePool has continued.

The methods NodeInfo.getFingerprint() and NodeInfo.getNameCode() have been dropped, except for nodes that implement the FingerprintedNode interface. This means that implementations of NodeInfo that wrap third-party XML tree models no longer need to implement these methods, and no longer need to be tied to a NamePool.

In earlier releases, document nodes were always represented by an object that implemented the DocumentInfo interface (which extended NodeInfo). The DocumentInfo object was used to hold information about the tree as a whole, for example keys and IDs. In Saxon 9.7, the class DocumentInfo is retained to provide a measure of compatibility for some commonly used interfaces, but it is no longer the case that every document node is represented by an instance of DocumentInfo; in fact DocumentInfo is now just a wrapper around a NodeInfo designed to keep existing code working. Information about a tree as a whole is now contained in a new TreeInfo object; this exists for all trees, whether or not they are rooted at a document node. This provides a place to put information about accumulators, which can exist for any tree whether or not the root is a document node.

Collections

A number of changes have been made to the way collection URIs are handled, mainly: (a) to support the XPath 3.1 capability to return any kind of item in a collection, not only a node (for example, collections can now include maps derived from JSON files, unparsed text files, and binary objects); (b) to allow streamed processing of the documents in a collection; and (c) to conform with the rules in the specification as regards stability (that is, repeated calls returning the same results).

The CollectionURIResolver interface is superseded by a new more flexible CollectionFinder. The old CollectionURIResolver is still supported, but provides less capability. The new mechanism is described in the Javadoc documentation; for an outline, see Collections.

To handle the Saxon collection URIs with options such as validation=strict, the Source object that is returned can be an AugmentedSource, which holds parser options as well as the source information itself.

In Saxon-EE, fn:collection() is multi-threaded, parsing multiple documents simultaneously in different threads. This previously happened within the default collection URI resolver; it now happens within the code of the fn:collection() function itself, so it works even if a user-defined collection URI resolver is in use. An additional change in this release is that the order in which documents are returned in the result of fn:collection() is now always the same as the order in which they are delivered by the collection URI resolver, making the order more predictable at a slight cost in latency.

Collections can now be stable, meaning that multiple calls with the same collection URI are guaranteed to return the same results. Collection stability can be expensive, because the contents of a collection have to be maintained in memory just in case it is used again; it is therefore not the default, even though required for conformance with the W3C specifications. Collection stability can be switched on in several ways: the collection URI can include the query parameter stable=yes; the collection finder can return a ResourceCollection object whose isStable() method returns true; or the Configuration property FeatureKeys.STABLE_COLLECTION_URI can be set to true. A collection is stable if any of these methods returns true.

The option unparsed=true among the query parameters of the collection URI is no longer supported, as the functionality can now be achieved by calling fn:uri-collection() followed by fn:unparsed-text().

A new option for the collection URI query parameters is metadata=yes. When this is used, the items returned by the collection() function are maps; the entries in the map include properties of the resources within the collection, plus a function fetch() that can be called to fetch the actual content of the resource. For further details see Collections.

The standard URIResolver and the standard ModuleURIResolver have been enhanced to recognize the classpath URI scheme. For example, in XSLT it is now possible to write <xsl:include href="classpath:utility.xsl"> which locates utility.xsl on the Java classpath. (The classpath URI scheme was introduced as part of the Spring framework, but Saxon's implementation is free-standing.) On the command line, in options such as -s, names prefixed classpath: are now recognized (along with http and file) as being URIs rather than filenames, avoiding the need to specify the -u option.

Location information

The Receiver interface has changed, so that location information is now passed with all events (for example, startElement as a Location object, rather than as an integer locationId). This change was necessary because with independent compilation of packages, it becomes difficult to allocate globally unique location IDs at package compile time. The change also enables richer location information to be maintained, enabling more precise diagnostics especially of dynamic errors.

The move away from integer location IDs to Location objects is fairly pervasive, and affects many interfaces that are important to products that interface intimately to Saxon, for example to provide debugging support. In particular expressions in the expression tree now contain location information in the form of a Location object; they no longer implement the SourceLocator interface directly.

The Expression tree

There have been substantial changes to the internal structure of the Expression tree. These are only likely to affect applications that interface to Saxon at a very low level. Among the changes:

The Container object has gone.
Expressions now contain a reference to their parent expression in the tree.
An expression now contains a reference to a RetainedStaticContext object, which holds that part of the static context that might be needed at execution time. To save space, an expression whose static context is the same as its parent or sibling expressions will generally share the same RetainedStaticContext object.
Because expressions now hold more context information, the need to pass this information dynamically during the type-checking and optimization processes using the ExpressionVisitor object is diminished.