saxonica.com

Integrated extension functions

There are two ways of writing extension functions. The traditional way is to map the name of the function to a Java or .NET method: specifically, the namespace URI of the function name maps to the Java or .NET class name, and the local part of the function name maps to the Java or .NET method name. These are known as reflexive extension functions, and are described in later pages.

In Saxon 9.2, this technique is supplemented by a new mechanism, referred to as integrated extension functions. With this approach, each extension function is implemented as a pair of Java or .NET classes. The first class, the ExtensionFunctionDefinition, provides general static information about the extension function (including its name, arity, and the types of its arguments and result). The second class, an ExtensionFunctionCall, represents a specific call on the extension function, and includes the call() method that Saxon invokes to evaluate the function.

There are several advantages in this approach:

When a stylesheet or query uses integrated extension functions and is run from the command line, the classes that implement these extension functions must be registered with the Configuration. On Saxon-PE and Saxon-EE this is most conveniently done by declareing them in a configuration file. For details see The Saxon Configuration File.

The arguments passed in a call to an integrated extension function are type-checked against the declared types in the same way as for any other XPath function call, including the standard conversions such as atomization and numeric promotion. The return value is checked against the declared return type but is not converted: it is the responsibility of the function implementation to return a value of the correct type.

The methods that must be implemented (or that may be implemented) by an integrated extension function are listed in the table below. Further details are in the Javadoc for the IntegratedFunction class.

First, the ExtensionFunctionDefinition class:

Method

Effect

getFunctionQName

Returns the name of the function, as a QName (represented by the Saxon class StructuredQName). Like all other functions, integrated extension functions must be in a namespace. The prefix part of the QName is immaterial.

getMinumumNumberOfArguments

Indicates the minimum number of arguments that must be supplied in a call to the function. A call with fewer arguments than this will be rejected as a static error.

getMaximumNumberOfArguments

Indicates the maximum number of arguments that must be supplied in a call to the function. A call with more arguments than this will be rejected as a static error.

getArgumentTypes

Returns the static type of each argument to the function, as an array with one member per argument. The type is returned as an instance of the Saxon class net.sf.saxon.type.SequenceType. Some of the more commonly-used types are represented by static constants in the SequenceType class. If there are fewer members in the array than there are arguments in the function call, Saxon assumes that all arguments have the same type as the last one that is explicitly declared; this allows for function with a variable number of arguments, such as concat().

getResultType

Returns the static type of the result of the function. The actual result returned at runtime will be checked against this declared type, but no conversion takes place. Like the argument types, the result type is returned as an instance of net.sf.saxon.type.SequenceType.

When Saxon calls this method, it supplies an array containing the inferred static types of the actual arguments to the function call. The implementation can use this information to return a more precise result, for example in cases where the value returned by the function is of the same type as the value supplied in the first argument.

trustResultType

This method normally returns false. It can return true if the implementor of the extension function is confident that no run-time checking of the function result is needed; that is, if the method is guaranteed to return a value of the declared result type.

dependsOnFocus

This method must return true if the implementation of the function accesses the context item, context position, or context size from the dynamic evaluation context. The method does not need to be implemented otherwise, as its default value is false.

hasSideEffects

This method should be implemented, and return true, if the function has side-effects of any kind, including constructing new nodes if the identity of the nodes is signficant. When this method returns true, Saxon will try to avoid moving the function call out of loops or otherwise rearranging the sequence of calls. However, functions with side-effects are still discouraged, because the optimizer cannot always detect their presence if they are deeply nested within other calls.

makeCallExpression

This method must be implemented; it is called at compile time when a call to this extension function is identified, to create an instance of the relevant ExtensionFunctionCall object to hold details of the function call expression.

The methods defined on the second object, the ExtensionFunctionCall, are:

Method

Effect

supplyStaticContext

Saxon calls this method fairly early on during the compilation process to supply details of the static context in which the function call appears. The method may in some circumstances be called more than once; it will always be called at least once. As well as the static context information itself, the expressions supplied as arguments are also made available. If evaluation of the function depends on information in the static context, this information should be copied into private variables for use at run-time.

rewrite

Saxon calls this method at a fairly late stage during compilation to give the implementation the opportunity to optimize itself, for example by performing partial evaluation of intermediate results, or if all the arguments are compile-time constants (instances of net.sf.saxon.expr.Literal) even by early evaluation of the entire function call. The method can return any Expression (which includes the option of returning a Literal to represent the final result); the returned expression will then be evaluated at run-time in place of the original. It is entirely the responsibility of the implementation to ensure that the substitute expression is equivalent in every way, including the type of its result.

copyLocalData

Saxon occasionally needs to make a copy of an expression tree. When it copies an integrated function call it will invoke this method, which is responsible for ensuring that any local data maintained within the function call objects is correctly copied.

call

Saxon calls this method at run-time to evaluate the function.

The value of each argument is supplied in the form of a SequenceIterator, that is, an iterator over the items in the sequence that make up the value of the argument. This may use lazy evaluation, which means that a dynamic error can occur when reading the next item from the SequenceIterator; it also means that if the implementation does not require all the items from the value of one of the arguments, they will not necessarily be evaluated at all (it is good practice to call the close() method on the iterator if it is not read to completion.)

The implementation delivers the result also in the form of a SequenceIterator, which in turn means that the result may be subject to delayed evaluation: the calling code will only access items in the result as they are required, and may not always read the result to completion. To return a singleton result, use the class net.sf.saxon.om.SingletonIterator; to return an empty sequence, return the unique instance of net.sf.saxon.om.EmptyIterator.

Having written an integrated extension function, it must be registered with Saxon so that calls on the function are recognized by the parser. This is done using the registerExtensionFunction method available on the Configuration class, and also on the s9api Processor class. It can also be registered via an entry in the configuration file. The function can be given any name, although names in the fn:, xs:, and saxon: namespaces are strongly discouraged and may not work.

It is also possible to register integrated extension functions under XQJ, using the SaxonXQStaticContext class which implements the XQStaticContext interface.

There are corresponding classes in the .NET API, which can be used to define an extension function written in a .NET language such as C#.

Next