Internal changes

There have been some changes to key internal interfaces which affect a great many classes throughout the product, and which also occasionally surface in APIs.

The SequenceIterator interface, which is widely used throughout the Saxon code, has been changed so that it no longer has a hasNext() method. Instead, the caller should invoke next() repeatedly, and the end of the sequence is indicated by returning null. The purpose of this change is to reduce the number of method calls, but more importantly, to reduce the amount of state information that iterators have to hold, and to reduce the effect whereby each iterator in a pipeline looks ahead by one item, causing an unnecessary amount of wasted effort if the pipeline is aborted, which happens for example when finding the effective boolean value of a sequence.

The internal representation of type information has changed, because of the need to accommodate user-defined types. A new class (actually an Interface) ItemType has been introduced; this and the occurrence indicator form the two parts of a SequenceType. The method getItemType on an expression now returns an object that implements this interface. For atomic values, this is an AtomicType object, which is also used in the hierarchy of schema types. In the case of user-defined atomic types, this object contains a reference to the SimpleType object held in the schema data model (which will be available only in the schema-aware version of the product). For nodes, the ItemType interface is implemented by a NodeTest, which is also used to represent conditions in an AxisStep of a path expression, and which is a subclass of Pattern. In the case of node types that specify the required content type, for example attribute(*,xs:date), a ContentTypeTest is used.

A number of the implementations of the tree model create transient wrapper nodes whenever a path expression is used to select a set of nodes. A new optimization has been introduced so that in the case where the nodes are immediately atomized, the tree model is allowed to return the typed value of a node instead of returning the node. This firstly avoids the cost of creating the wrapper node, and secondly avoids the cost of creating another iterator to process the typed value, in the case where the typed value is a singleton. This is currently done only in the common case where the typed value is actually untypedAtomic. Any user-defined implementations of the tree model that implements the interface AxisIterator will need to support the additional method setIsAtomizing(); however, an implementation that does nothing is acceptable.

The method getAttributeValue(uri, localName) has been removed from the NodeInfo interface, so there is one less thing that suppliers of this interface have to provide. It is replaced by a helper method in the Navigator object.

The typeCode passed down the Receiver pipeline is now the name pool fingerprint of the actual type name. This is also the value that is stored as a type annotation in the data model. Currently this is supported only in the TinyTree. In the non-schema-aware product, the typeCode will always be -1, indicating that the node is untyped.

The way that standard names are handled in known namespaces such as XSLT, Saxon, and XML Schema has changed. The fingerprints for these names are now compile-time constants. The NamePool code has been adapted so that these namespaces are specially recognized, and the standard constants are returned. This saves time and space when building the NamePool. It also makes it possible to have a standard schema defined as a static Java object for the built-in types.

In response to suggestions from Karsten Rucker, I have made some changes designed to conserve memory in both the standard tree and tiny tree implementations of the data model. In the standard tree, the document node no longer contains a reference to the factory used to build it: this was preventing the XML parser and its buffers being garbage-collected. In the tiny tree, the condense() operation is now called after building trees from source documents (it was previously called only for temporary trees). It also now condenses the buffer used for character data.