Proceedings of Extreme Markup Languages®

Introduction

XSLT is a declarative language for transforming XML documents. In particular, it is a language for transforming the tree representation of XML documents: it takes one or more trees as input, and produces one or more trees as output. Typically the input trees will be created by parsing lexical XML documents, and frequently the output trees will be turned back into lexical XML by a process known as serialization. But those processes are not part of the XSLT transformation proper: officially, XSLT produces output trees from input trees.

As with programs in any other declarative language, the performance of XSLT stylesheets depends heavily on optimization. The techniques for optimizing any program, and declarative programs in particular, have been well researched. The classical techniques all involve rewriting expressions in the program (generally at compile time) into other equivalent expressions that produce the same results but execute more quickly, or use less memory, or both. The rewrites are generally implemented as manipulations of a tree representation of the program known as the abstract syntax tree.

So we see that XSLT is a language for transforming trees, and we see that optimization is a process that involves the transformation of expression trees. So it becomes natural to ask whether XSLT is a suitable language for writing an optimizer. The purpose of this paper is to explore the feasibility of such an approach.

In the paper I will be investigating this idea with relation to the Saxon XSLT and XQuery processor which is developed and marketed by my company, Saxonica Limited. Saxon combines XSLT and XQuery processing in a single engine; in effect XSLT and XQuery are treated as different surface syntaxes, but everything that happens after the parsing stage is common to both languages. Although the title of the paper refers to XSLT optimization, I'm therefore concerned just as much with optimization of XQuery, and of the XPath language which XSLT and XQuery share as a common sublanguage.

In asking whether it is feasible to write an optimizer in XSLT, we will also be asking whether this delivers any benefits. There are two kinds of benefits we might expect. Firstly, there is a long tradition of writing compilers in the language being compiled. The benefits of doing this are subtle: it means that the developers of the compiler are exposed every day to their own product, which means they will be well placed to discover problems with the software and well motivated to fix them. It also reduces the dependency of the compiler on other technologies. Secondly, optimization is intrinsically a rule-based process, and using a rule-based language ought to make the overall set of optimization rules more manageable and more amenable to analysis. In the case of Saxon (and this reflects reported experience with relational database optimizers) the current set of optimizations, written in Java, suffer from being ad-hoc and widely scattered around the code. This makes it difficult to predict with any certainty what optimizations will "kick in" for any given stylesheet or query. There has also been a history of an over-concentration of bugs in this area, caused by tree rewrites that leave the tree in an inconsistent state.

The obvious objection to the use of XSLT to write the optimizer is performance. A section of the paper is therefore devoted to this subject.

The optimization techniques used in Saxon are entirely "syntax-based", in the sense that they depend only on the way that the stylesheet or query is written, and have no access to additional information about the data to be manipulated. Saxon in its schema-aware version has access to information about the schemas for the source and result documents, but stylesheets and queries are compiled and optimized without any knowledge of the actual document instances. Saxon is not a database product, and therefore cannot exploit the cost-based techniques that are often used in database query optimization, where statistics on data volumes and value distributions provide an important input to the optimization strategy.

Some simple rewrites

Let's start with some simple examples of rewrites that Saxon performs.

Consider the expression count($x) = 0. The Saxon optimizer translates this to empty($x). The reason for this is that counting the number of items in sequence involves computing the entire sequence, whereas the the answer to the expression empty($x) can be determined by attempting to read the first item in the sequence. Like most functional languages, Saxon uses pipelined execution at run-time, so the value of $x is typically not computed until it is needed. When it is needed, it is computed one item at a time. If the containing expression only needs the first item, then it will only compute the first item — a technique called early exit. This means it is desirable for the compile-time optimizer to rewrite expressions into a form where early exit will deliver savings.

Saxon's raw unoptimized expression tree for the expression count($x) = 0 appears like this when rendered in XML:

    <generalComparison op="=">
      <functionCall name="fn:count">
        <variableReference name="x"/>
      </functionCall>
      <literal value="0" type="xs:integer"/>
    </generalComparison>

The target expression empty($x) has the following expression tree:

    <functionCall name="fn:empty">
      <variableReference name="x"/>
    </functionCall>

The transformation from the raw input tree to the optimized expression tree can be achieved by the following stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">

<xsl:template match="*">
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:template>

<xsl:template match="generalComparison[@op='=']
                     [*[1][self::functionCall][@name='fn:count']]
                     [*[2][self::literal][@value='0'][@type='xs:integer']]">
   <functionCall name="fn:empty">
      <xsl:apply-templates select="*[1]/*"/>
   </functionCall>
</xsl:template>

</xsl:stylesheet>

This stylesheet follows a very common design pattern with XSLT: it contains a fallback template rule (match="*") that copies an element unchanged if there is no specific rule that matches that element, and then it complements this with template rules that match the elements that need to be modified.

The match pattern for this rule looks rather complicated (and this is a simple rewrite, they are going to get worse). I have split it into three lines to make it clearer. The first line generalComparison[@op='='] says that we are matching a general comparison whose operator is "=". The second line [*[1][self::functionCall][@name='fn:count']] says that the first operand of the comparison must be a call on the count() function. The final line [*[2][self::literal][@value='0'][@type='xs:integer']] says the other operand must be a literal of type xs:integer whose value is 0.

Note how the rule works regardless of the actual operand of the count() function, and how the processing applies recursively to any expressions nested within this operand. This all falls out from the standard XSLT apply-templates mechanism.

In practice we would not want to make the conditions quite so strict. We would like the rule to handle the operator "eq" as well as "=", and we would like to allow the operands to appear in either order. This suggests a revised rule as follows (for ease of exposition, we won't worry about finer details like sensitivity to namespace prefixes):

<xsl:template match="*[@op=('=', 'eq')]
                     [*[self::functionCall][@name='fn:count']]
                     [*[self::literal][@value='0'][@type='xs:integer']]">
   <functionCall name="fn:empty">
      <xsl:apply-templates select="functionCall/*"/>
   </functionCall>
</xsl:template>

</xsl:stylesheet>

Let's look at another simple rewrite performed by the Saxon optimizer. The expression for $i in $sequence return $i can be rewritten as $sequence. (You might ask whether it is worth doing this optimization: does it ever arise in practice? The answer is that it does, for two reasons. Firstly, many XQuery novices imagine that every query has to be written as a FLWOR expression, so they start by typing "for" just as they would type "select" in SQL. Secondly, constructs of this kind often arise as the output of a rewrite rule in the optimizer. They then need to be further simplified.)

The expression tree for this construct looks like this:

<for variable="i" as="item()">
    <variableReference name="sequence" role="in"/>
    <variableReference name="i" role="return"/>
</for>

(I made the decision for reasons of simplicity and consistency that the immediate subexpressions of an expression should always be direct children of the parent expression. Rather than use an intermediate element to describe the role of each subexpression, I have used a role attribute on the element representing the subexpression. For first-order expressions like a general comparison or a function call, the role is omitted.)

The rewritten expression tree in this case is simply:

<variableReference name="sequence"/>

To add this rewrite to the existing stylesheet, we simply add the rule:

<xsl:template match="for[@variable = variableReference[@role='return']/@name]">
  <xsl:apply-templates select="*[@role='in']"/>
</xsl:template>

There are a few complications we should note about this rule. Firstly, it will do for optimizing XPath, but not for XQuery. In XQuery we need to extend it to check for the absence of constructs such as where or order by clauses. Secondly, the simple comparison of the @variable attribute of the for clause with the @name attribute of the variable reference needs to take into account that these names are QNames and need to be compared using namespace URIs rather than prefixes. Fortunately XSLT 2.0 takes care of this, provided we assume that we're running a schema-aware query in which the relevant attributes carry a type annotation to indicate that they are of type xs:QName. (The same comment applies to the first rule we introduced, where the test on the @type attribute should more correctly have been written [@type=xs:QName('xs:integer')].)

These two rewrite rules can coexist without any trouble. If we run the combined stylesheet containing both rules on an expression such as count(for $i in 1 to $N return $i) = 0, we will get the expression tree corresponding to empty(1 to $N), showing that both rules have been applied. (A further rule might turn this into the expression $N < 1)

These two rules are a little unusual, in that they are completely context-free. They can be run directly on the output of the parser, and require no knowledge about data types or other semantic properties of the expressions. More commonly, rules will require access to such properties, and we will look at such rules in due course. First, however, there are a few points to make about the representation of the expression tree as an XML data structure.

Representing the expression tree in XML

The XML representations of expressions that I used in the previous section use a slightly adapted form of the representation used by the Saxon "explain" option in releases from Saxon 9.0 onwards. The "explain" option is designed to help users understand how the expression has been optimized: it gives a representation, in effect, of the query execution plan. Releases up to Saxon 8.9 used a textual representation; this has been changed to an XML representation to make the data more amenable to analysis, for example it gives an IDE the opportunity to display a graphical representation of the plan.

This is not necessarily the best possible way of representing the expression tree when the objective is transformation rather than user enlightenment. In this paper I have already assumed a few minor adjustments to the format to improve the structure a little; when the time comes for a live implementation one can contemplate many more such improvements being made. For this paper I'm not concerned with the detail of the XML representation chosen, only with the broad principles of the design.

One possibility that comes to mind is to use XQueryX [XQX2007]. XQueryX is a fine-grained XML representation of the syntax of XQuery 1.0. As such it covers the whole of XPath 2.0. It doesn't cover XSLT, of course, but it could easily be extended with additional elements and attributes to achieve this. (Saxon builds a single unified expression tree for an XSLT stylesheet, in which there are no visible boundaries between the parts that derive from XSLT syntax and the parts built from XPath syntax. It isn't possible to tell from the expression tree, for example, whether a conditional was originally written using an <xsl:choose> instruction in XSLT or an if-then-else expression in XPath.)

However, XQueryX would need to be extended: not just to handle XSLT, but also to handle the many internal operators and functions that are generated by the Saxon optimizer that do not correspond to constructs in the surface syntax available to the user. Furthermore, as we shall see, there's a need to support many annotations to the expression tree representing properties of expressions derived from semantic analysis. The conclusions of this paper wouldn't be substantively different if we designed an XML tree that took XQueryX as its starting point rather than the "explain" tree currently produced by Saxon, but in practical terms there aren't any real benefits that would be obtained by starting with a standardized vocabulary rather than a product-specific one.

In designing an XML representation of the expression tree, the main difficulty is in finding a suitable representation of the more complex properties of nodes on the tree. To take an example, some kinds of expression depend on the namespace context: the XSLT instruction <xsl:element name="{$name}"/> and its equivalent in XQuery syntax element {$name} {} both allow the variable $name to hold a lexical QName, that is a string in the form "prefix:uri", where the prefix is known only at run-time, but must be translated to a namespace URI by reference to the prefixes that were declared at that point in the stylesheet or query. So the namespace context (a set of prefix-to-uri mappings) forms an implicit operand of this expression, which must be retained through all rewrites, and therefore needs to be held as an annotation on the abstract syntax tree. Clearly, there are a number of ways that a set of prefix-to-uri mappings can be represented using XML elements or attributes, and it's not obvious what the best or most efficient representation is. Because the optimizer doesn't actually have to manipulate this operand (it only has to preserve its value) the best solution might be to encode the whole set of mappings in the value of a single attribute using some kind of microsyntax. An alternative is to store the data in a Java object, and have the XML expression tree simply hold an abstract identifier that can be used to fetch the Java object from a hash table.

Many of the semantic properties that Saxon currently maintains on its Java-based expression tree are boolean properties, and these tend to be very compactly encoded as bits within a single integer. In moving to a more natural XML representation where each property is a separate boolean attribute, one clearly needs to watch out that this does not introduce gross inefficiencies.

However, we should try to keep separate the logical modeling of the expression tree as XML (or more strictly, its modeling as an instance of the XPath data model XDM), and the physical representation of that tree. Saxon represents the XDM data model internally using a Java interface called NodeInfo, and it works with any Java classes that implement this interface. This allows Saxon to access input trees supplied in any number of formats such as DOM, JDOM, DOM4J, or XOM, simply by mapping these onto the NodeInfo interface. For physical implementation, one design option is simply to implement a NodeInfo wrapper over the current expression tree structure. Apart from anything else, this would allow incremental introduction of XSLT-based rewrites rather than requiring a "big bang" cut-over. At present, however, my inclination is to use a native representation of the XDM tree, and then convert the final output of the optimizer into the current "executable" expression tree. One advantage of this is that using different representations of the tree at compile time and at run time would eliminate the many compromises that are currently made in a tree that is designed to meet two rather different kinds of requirement. It would also make it easier to make the run-time tree immutable after initial construction, thus eliminating an occasional source of hard-to-detect multi-threading bugs.

I have already mentioned namespace context as one of the data types on the expression tree that presents challenges when devising an XML representation. Other data types that require thought include:

References to other nodes on the tree (for example from a function call to the function declaration): one could use id/idref linkage, but it's a little clumsy.
Types: an important input to the optimizer is inferred information about the static types of expressions. Saxon computes this information in a separate processing phase that precedes the optimization proper. A direct representation of the source SequenceType syntax is not easy to manipulate, and with such a representation it might be difficult to ask questions such as "is the type of $x one which, after atomization, could be numeric?". The solution to this might be to represent all types as opaque IDs which can be interrogated via a library of extension functions (that is, calls to underlying Java methods.) Another possibility is to define an XML-based representation of the type information in the schema component model, and interrogate this information using path expressions.
Location information used for diagnostics (that is, information about where in the source text an expression appeared). The problem here is the sheer bulk of information. It has a high level of redundancy, so it should be possible to devise compact encodings, but storing information redundantly in each node of the expression tree makes it easier to preserve across rewrites.
Collations, comparers, calculators: the Java implementation of the Saxon expression tree makes extensive use of helper objects which actually perform required computations, either at run-time when the tree is interpreted, or earlier if constant subexpressions are encountered.
Dependencies: many of the rewrite rules used by the optimizer rely on knowledge of the dependencies of an expression. For example, given an expression such as //item[@code=$productCode], Saxon-SA will convert this to an index lookup (a call on the XSLT key() function, using a system-allocated key) if it can determine that one of the operands of the "=" operator depends on the context item and the other operand does not. To this end, Saxon maintains a number of boolean flags on the expression tree to represent the aspects of the context that the expression depends on: the context item, position, and size, local variables, and some of the extra XSLT context variables such as current group.

Many of the properties of nodes on the expression tree are computed lazily (that is, when first needed). This has the advantage that the value is not computed unless it is needed, and it is then computed at most once. The difficulty with this strategy is knowing when to invalidate the computation in the event that rewrites cause the value to change. A failure to recompute a property such as the dependencies of an expression is a common source of optimizer bugs.

A plausible strategy for such attributes, once the tree is represented in XML, is to implement them in the form of memo functions. Saxon allows a user-written function to be designated by the user as a memo function; in this situation the value is computed the first time the function is applied to a particular set of arguments, and is then remembered. If the optimizer is written in XSLT, then a rewrite of a subtree will always generate new nodes with new identity, which means that a memo function applied to the node will automatically be recomputed after a rewrite. This is an example of how the disciplines of declarative programming can give improvements in reliability.

The Saxon processor can be considered to operate in the following phases:

Parsing, which constructs an expression tree that is a fairly direct representation of the syntax as written by the user, with some very limited local simplification such as replacing the function calls true() and false() by constants.
Fixup of variable references and function references (and in XSLT, called templates, keys, and so on): this essentially adds a pointer in the tree from the node that references an object to the object itself (or to the node that defines the object, which might be the same thing).
Type checking: Saxon infers static types for expressions using rules similar to, but by no means identical to, those in the XQuery Formal Semantics. Saxon implements "optimistic static typing", which means in effect applying the following rules:
- If the static type of a supplied expression is subsumed by the type required by the context in which it is used, do nothing (execution is type-safe);
- If the type of the supplied expression is disjoint with the required type, report a static error (execution can never succeed);
- If the type of the supplied expression overlaps the required type but is not subsumed by it, generate run-time type checking code. This is done by adding nodes to the expression tree.
The type checking phase also generates code on the expression tree to perform the various implicit type conversions required by the language semantics, notably atomization of nodes to extract atomic values, and numeric promotion (for example from integer to double).
Optimization: rewriting of the expression tree for efficiency
Either interpretive evaluation of the expression tree against a source document, or generation of compiled Java code to implement the operations on the expression tree.

The design of the expression tree needs to take the needs of all these phases into account. It would be possible to use different tree representations for the different phases, but it is likely that this would greatly complicate the logic.

As already mentioned, the preferred strategy for Saxon is to design a new XML-based tree representation which will be output by the parser, manipulated by the various phases of compilation (semantic fixup, type-checking, and optimization), and then finally converted either into a data structure similar to the current tree for run-time interpretation, or into executable Java code. The aim will be to represent as much of the information as possible using ordinary XML elements and attributes, and as much of the logic as possible using XSLT code. Initial answers to the various design questions include the following:

References to helper classes such as collators and calculators will be held in the form of abstract identifiers which provide access to the underlying Java object via a hash table.
Some objects such as namespace context will be encoded as character strings using a suitable micro-syntax, and held directly on the tree.
Computed properties of expressions, such as its static type and its dependencies, will be provided by means of memo functions. Saxon allows an XSLT (or XQuery) user-defined function to be marked as a memo function, which means that the value once computed for a given combination of arguments is remembered, and if the function is called again with the same arguments then the remembered value is returned without re-computation. This is a better approach than using physical XML attributes holding the information on the tree, because XSLT does not allow a tree to be modified in-situ, which would mean the property either has to be computed when the node is first created, just in case it is needed, or it has to be calculated every time it is needed. Memo functions provide a good alternative to these extremes.

Each top-level function, global variable, or template will be represented by a separate tree: this is to reduce the amount of data to be copied when trees are transformed, since the vast majority of optimization activity is local to individual functions or templates (known collectively in Saxon as procedures). These trees will be accessible via Java data structures, so that references from one tree to another will always go via a dictionary maintained by the Java support code. Cross references from a variable reference to a local variable defined within the same procedure will use an id/idref approach.

Context-sensitive rewrites

In this section I will examine the feasibility of doing rewrites that are context-sensitive: that is, rewrites that depend on knowledge of the static type of an expression, or its dependencies, or other static properties such as its creativity (whether or not it can create new nodes). We'll look first at an example that is quite localized; in the following section we'll look at longer-range rewrites.

Consider a filter expression E1[E2]. If the static type of E2 is integer, and if E2 does not depend on the context node, position, or size, then this expression can be rewritten as saxon:item-at(E1, E2), where saxon:item-at is a function that selects the Nth item in a sequence. In this notation E1 and E2 represent arbitrary XPath expressions - or, indeed, arbitrary XSLT/XQuery expressions. This rewrite is useful for several reasons. It means that E2 only needs to be evaluated once, rather than once for each item in E1. It means that there is no need to evaluate more than N items of E1, where N is the run-time value of E2. And in the case where the value of E1 has already been materialized in memory, the function saxon:item-at() can use direct addressing, giving constant performance rather than performance linear in the size of E1.

I have written item-at here as a call of a function in the Saxon namespace. This looks like a clean approach for rewriting an expression to something that could not have been directly written by the user in the surface language. However, this is an abstraction of the way Saxon works internally: rather than custom functions, it effectively uses custom operators and custom node types on the expression tree. But life would probably be simpler if it used an internal function call in cases like this, so I will simplify the exposition by speaking as if it did.

To take an example, suppose the actual expression is $E1[last() idiv 2]. We can't directly apply the above rewrite because the expression depends on the context size (that is, last()). But let's assume that we have first rewritten the expression as $E1[count($E1) idiv 2] (it happens all the time in this game that one rewrite opens the door to others). The expression tree for this is as shown below:

<filterExpression>
  <variableReference name="E1" role="base"/>
  <arithmetic op="idiv" role="filter">
    <functionCall name="count">
      <variableReference name="E1"/>
    </functionCall>
    <literal value="2" type="xs:integer"/>
  </arithmetic>
</filterExpression>

and we want to rewrite it as:

<functionCall name="saxon:item-at">
  <variableReference name="E1"/>
  <arithmetic op="idiv">
    <functionCall name="count">
      <variableReference name="E1"/>
    </functionCall>
    <literal value="2" type="xs:integer"/>
  </arithmetic>
</functionCall>

This clearly is a very minor change to the tree, and the only difficult part is to establish the conditions that make it legitimate to perform the rewrite. I will abstract away from the complexities by using functions and named constants to determine the properties of the input expression:

<xsl:template match="filterExpression
                     *[2][opt:subsumes($XS_INTEGER, opt:item-type(.))][not(opt:depends-on-focus(.))]">
  <functionCall name="saxon:item-at">
    <xsl:apply-templates/>
  </functionCall>
</xsl:template>

All that remains is to define a function opt:item-type() that computes the static item type of an expression, a global variable $XS_INTEGER that represents the item type xs:integer, a function opt:subsumes() that determines whether one type subsumes another, and a function opt:depends-on-focus(.) that returns true if the expression depends on the context item, position, or size.

Some of these functions, such as opt:subsumes, are probably best written in Java rather than in XSLT, if only because the code already exists in Saxon and is decidedly non-trivial. Other functions, such as opt:depends-on-focus, can readily be written in XSLT. Such a function is intrinsically polymorphic, and we can exploit XSLT template rules to implement the polymorphism. Here's an implementation, which ignores one or two minor complications of the XPath language semantics, but is almost good enough for prime time:

<xsl:function name="opt:depends-on-focus" as="xs:boolean" saxon:memo-function="yes">
  <xsl:param name="exp" as="schema-element(expressionNode)"/>
  <xsl:variable name="intrinsic" as="xs:boolean">
    <xsl:apply-templates select="$exp" mode="intrinsically-depends-on-focus"/>
  </xsl:variable>
  <xsl:sequence select="$intrinsic or
                        $exp/child::*[not(@role=('filter', 'step'))][opt:depends-on-focus(.)]"/>
</xsl:function>

<xsl:template match="contextItemExpression"
              mode="intrinsically-depends-on-focus" as="xs:boolean">
  <xsl:sequence select="true()"/>
</xsl:template>

<xsl:template match="functionCall[@name='fn:position']"
              mode="intrinsically-depends-on-focus" as="xs:boolean">
  <xsl:sequence select="true()"/>
</xsl:template>

<xsl:template match="functionCall[@name='fn:last']"
              mode="intrinsically-depends-on-focus" as="xs:boolean">
  <xsl:sequence select="true()"/>
</xsl:template>

<xsl:template match="*"
              mode="intrinsically-depends-on-focus" as="xs:boolean">
  <xsl:sequence select="false()"/>
</xsl:template>

The way this works is that an expression depends on the focus either if it has an intrinsic dependency on the focus, or if one of its subexpressions depends on the focus, ignoring any subexpression that has the role "filter" or "step". This is because there are two cases in XPath, namely the expressions START/STEP and BASE[FILTER] where the first subexpression sets the context for evaluation of the second, and therefore where the fact that the second operand depends on the context doesn't mean that the expression as a whole does so. (This covers XPath. There are also cases in XSLT that need to be added, such as xsl:for-each-group. Also in XSLT we generally have to assume that the result of xsl:apply-templates will depend on the focus, because we can't analyze at compile time which template rules will be called.)

In the type signature for the function I have used the type schema-element(expressionNode). I haven't said anything about the schema for the expression tree until now, but I'm assuming here that all the nodes such as <arithmetic> and <filterExpression> will be members of a substitution group with <expressionNode> at its head. In fact, in a situation like this where elements will often be processed generically, it can be useful to define quite a rich type hierarchy in the schema to enable elements to be classified, so that template rules can be applied to a whole class of nodes on the tree without listing all the element names individually.

XSLT 1.0 users may be a little surprised to see these template rules returning boolean values rather than constructing new nodes; but it's a very natural thing to do in XSLT 2.0. The only slight awkwardness is that feeding the result of an XSLT template into an XPath expression requires use of a variable (in this case $intrinsic).

There's a lot more detail to be worked out to complete the code for this rewrite (for example, I haven't gone into the question of how types are best represented in the tree), but hopefully the sketch that has been presented is detailed enough to convince the reader that it is entirely feasible.

Non-local Rewrites

The rewrite rules that cause most complexity in the current Java implementation are those that work at a distance. An example of this is the rule that extracts subexpressions from a filter if they do not depend on either the context item or position. For example, given the filter expression //item[matches(@code, concat('^', $pattern, '$'))], Saxon will rewrite this as let $temp := concat('^', $pattern, '$') return //item[matches(@code, $temp)] (which is of course XQuery syntax, but that doesn't matter). This has the benefit that the expression is evaluated once only, outside the loop. (Actually the rule is a bit more subtle than this; the rewritten expression evaluates $temp on the first entry to the loop, so that if //item is empty, then $temp is never evaluated. This is important to provide correct behavior in error cases.)

In the Java implementation, when Saxon encounters a filter expression it constructs a "PromotionOffer" containing details of the filter expression, and then walks the subtree representing the filter expression. Any expression containing a subexpression that is not dependent on the context item, or on some intermediate variable declared within the filter expression, can then "accept" the offer, which it does by creating the "let" expression as a wrapper around the filter expression, making the independent subexpression a child of the "let" expression, and then replacing the original subexpression with a reference to the new variable. Such logic is intricate and error prone.

When we write this in XSLT, we need to use a similar approach of doing a recursive walk of the filter tree. We can't simply search it for expressions that don't depend on the context, because we need to stop the search at the point where context changes are introduced, for example given //item[some $c in colors satisfies $c = 'red'] it would be wrong to extract $c = 'red' as an expression with no dependency on the context.

(An alternative approach, used by the XQuery formal semantics, is to remove all dependencies on context from the tree by introducing explicit variables. Saxon doesn't currently do this; but it would certainly make some things easier.)

In the Java implementation, each kind of expression decides whether to pass on the promotion offer to its children. (Of course, a default implementation can be used for the vast majority of expressions.) So again, we have a highly polymorphic operation. We can use the same approach in XSLT, using xsl:apply-templates in a specific mode to achieve the dynamic dispatch. We can also use the same approach of passing a promotion offer down the tree as a parameter, though XSLT 2.0 gives us the ability to make this implicit by using the facility of tunnel parameters. However, rather than trying to identify the nodes that are eligible for promotion and performing the tricky rewrite in a single pass, a cleaner approach seems to be to do the operation in three stages. The first stage identifies nodes that are eligible for promotion, returning a list of these nodes. The second stage creates a modified tree in which these nodes are replaced by variable references; and the third stage wraps the tree in new nodes defining the binding of these variables.

Ignoring special cases, the logic for the first pass looks like this.

  <xsl:template match="filterExpression" as="schema-element(expressionNode)">
    <xsl:variable name="nodesForPromotion" as="schema-element(expressionNode)*">
	  <xsl:apply-templates select="*[@role='filter']" mode="focusIndependent"/>
	</xsl:variable>
	<!-- second pass goes here -->
  </xsl:template>

  <!-- if the expression depends on context, try its children -->

  <xsl:template match="*[opt:depends-on-context-item(.) or opt:depends-on-position(.)]"
                mode="focusIndependent" as="schema-element(expressionNode)*">
	<xsl:apply-templates select="child::*" mode="focusIndependent"/>
  </xsl:template>

  <!-- if the expression is a variable reference or a constant, leave it alone -->

  <xsl:template match="variableReference | literal"
                mode="focusIndependent" as="schema-element(expressionNode)*"/>

  <!-- otherwise return this node as a promotion candidate -->

  <xsl:template match="*"
                mode="focusIndependent" as="schema-element(expressionNode)*">
    <xsl:sequence select="."/>
  </xsl:template>

As already mentioned, we need a few extra rules to prevent inappropriate parts of for, some, and every expressions being promoted, but that's just a question of adding a few more templates.

The second pass creates a copy of the tree in which these nodes are replaced by variable references. I will use the XSLT generate-id() function to construct new variable names.

  <xsl:template name="opt:replaceWithVars">
    <xsl:param name="nodesForReplacement" as="element()*" tunnel="yes"/>
    <xsl:copy>
      <xsl:copy-of select="@*"/>
      <xsl:for-each select="*">
        <xsl:choose>
          <xsl:when test=". intersect $nodesForReplacement">
            <variableReference name="{generate-id(.)}"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:call-template name="opt:replaceWithVars"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:for-each>
    </xsl:copy>
  </xsl:template>

The third pass processes the returned list of promotable nodes; for each one it creates a wrapping let expression. This process needs to be recursive, because each step is operating on the result of the previous step. It's a classic head-tail recursion:

  <xsl:function name="opt:promote" as="element(letExpression)">
     <xsl:param name="baseExpression" as="element()"/>
	 <xsl:param name="promotedNodes" as="element()*"/>
	 <xsl:choose>
	   <xsl:when test="$promotedNodes">
	     <xsl:variable name="varName" select="generate-id($promotedNodes[1])"/>
		 <xsl:variable name="let" as="element(letExpression)">
		   <letExpression var="{$varName}">
		     <xsl:copy-of select="$promotedNodes[1]"/>
		     <xsl:copy-of select="$baseExpression"/>
		   </letExpression>
		 </xsl:variable>
		 <xsl:sequence select="opt:promote($let, remove($promotedNodes, 1))"/>
	   </xsl:when>
	   <xsl:otherwise>
	     <xsl:sequence select="$baseExpression"/>
	   </xsl:otherwise>
	 </xsl:choose>
 </xsl:function>

I've over-simplified a little bit in that there's no code to create the role attributes of the children of the new letExpression. For this piece of code, using a wrapper element for the role would make life easier. I've also ignored any code needed to create links from a variable reference to the expression that binds the variable: at this stage, the only reference is the variable name. But the overall idea should be clear.

Putting the three phases together, the main template for the filter expression looks like this:

<xsl:template match="filterExpression" as="element()">
  <xsl:variable name="nodesForPromotion" as="element()*">
    <xsl:apply-templates select="*[2]" mode="focusIndependent"/>
  </xsl:variable>
  <xsl:variable name="afterReplace" as="element()">
     <xsl:call-template name="opt:replaceWithVars">
       <xsl:with-param name="nodesForReplacement" select="$nodesForPromotion" tunnel="yes"/>
     </xsl:call-template>
  </xsl:variable>
  <xsl:sequence select="opt:promote($afterReplace, $nodesForPromotion)"/>
</xsl:template>

Note in particular the use of xsl:apply-templates to return a list of references to existing nodes, rather than its conventional XSLT 1.0 role of constructing new nodes in the result tree. This is an important capability in XSLT 2.0 which is often overlooked.

Performance

Although I've left quite a few details of the design unresolved at this point, I've sketched out enough detail that it ought to be possible to make a preliminary assessment as to whether there is a realistic possibility of achieving acceptable performance when the tree rewriting code is written in XSLT rather than in Java.

As an experiment, I took an XQuery expression of the form (1 to 5000)[. = doc('d1.xml')//x and . = doc('d2.xml')//x and ...] with the document names ranging from d1 to dN. All these calls on the doc() function can be removed from the filter predicate, using the approach outlined in the previous section.

With 500 terms in the predicate, the query analysis using the existing Java optimizer takes around 300ms (Double the size, and it fails with a stack overflow). After patching the code to remove this optimization, the analysis time drops to around 40ms, so we can estimate the time spent doing this rewrite as 260ms. Reducing the number of terms to 250 brings the optimization time down to around 80ms; this suggests strongly that the optimization time is quadratic in the number of terms in the expression.

The XSLT transformation to reproduce this tree rewrite is currently taking around 1000ms for an expression with 250 terms. With 500 terms it fails with a stack overflow; with 125 terms, the time is down to about 280ms: again, this strongly suggests quadratic behaviour.

So it's clear (as we might have expected) that the XSLT solution is taking longer to transform the tree. The ratio of the timings for 250 terms is a factor of 12.5. The important questions are, (a) does it matter?, and (b) can it be improved? It's worth mentioning that we are comparing a Java implementation that has matured over a number of years with an XSLT implementation that was written quickly for the purpose of this measurement.

Whether or not the performance penalty matters depends on the extent to which compilation speed contributes to overall system performance. Clearly there are many scenarios where it makes very little difference how long it takes: such scenarios range from single-shot transformations of large documents, where the compilation time is dwarfed by execution time, through to high-throughput content delivery servers, where the cost of compiling stylesheets is amortized over a large number of transformations. However, there are other scenarios where the time penalty would certainly be noticed, not least by developers who want fast response times when compiling code after making changes.

Can it be improved? There's a wide variety of things to look at. For the development scenario, perhaps the most important opportunities are to optimize less often. Currently Saxon recompiles the whole stylesheet on every run: there's a lot of scope for only recompiling parts that have changed, or are affected by changes, or for not optimizing code that isn't executed, or for not optimizing it unless the user is compiling for production running.

But there are also opportunities for simply making the code go faster. The poor performance happens because every time an existing node is wrapped in a new letExpression element, it is necessary to make a complete copy of the subtree. It should be possible to find ways of avoiding this: Saxon already has a range of mechanisms such as "virtual copies" of nodes, but they aren't coming into play for this particular transformation. This is where the "eat your own dogfood" argument starts to come in: simply by using the optimizer to optimize the optimizer, there ought to be a steady improvement in which each improvement to the optimizer not only make user code go faster, but also makes the optimizer go faster. It's hard to place faith in that process without concrete evidence that it will work, but although it may be difficult to quantify the benefits, there can be no doubt that they exist.

Yet another approach lies in permitting in-situ updates to the tree. An update feature for XQuery is well-advanced in W3C; there's no reason why in a Saxon context it shouldn't be possible to combine such features (once implemented!) with XSLT processing, and take advantage of them to enable some tree rewrites to take place without bulk copying.

Conclusions

This paper is a feasibility study into the possibility of writing an XSLT optimizer in XSLT. There appear to be no insuperable problems in doing this, and there are many potential benefits. As a language, XSLT is ideally suited to the task, because of its rule-based template mechanism.

As one might expect, the biggest risk identified is in the performance of the optimizer itself. It's not clear from the initial studies whether or not it will be possible to achieve acceptable performance: this will require a prototype implementation that investigates a wider range of rewrites and that applies them to some real queries. It's worth noting that the design pattern for many of the optimizations is likely to involve recursive tree rewrites which are written to do repeated copying of the tree; there may be significant scope for optimizing this design pattern. There are many real user applications that have similar characteristics and would also gain from this, for example applications written using pipeline processing languages such as XProc.

Previous Work

I am not aware of any previous work suggesting the use of XSLT for rewriting expression trees as part of an optimizer, though David Carlisle's xq2xml [CAR2006] uses XSLT for translating XQuery trees expressed in XQueryX [XQX2007] into XSLT, which is not an entirely dissimilar undertaking.

The use of tree rewriting as a technique for optimizing programs, especially declarative queries, is well established: for recent overviews see for example [IOA1996] or [CHA1998]; or for a discussion relevant to XQuery optimization in particular, see [GRI]

There are a number of papers in the optimization literature that use high-level declarative languages to express the rewrite rules. An example is [BEC1992].

Despite the language's greater maturity, there has been less published research on XSLT optimization than on XQuery, perhaps because XSLT with its dynamic template rules is less amenable to static analysis, or perhaps simply because there is a large database research community that considers XQuery to be within its field of study. Most papers that exist discuss specific optimization techniques rather than general principles. Two examples are [KAY2004] and [KEN2005].