Streaming Templates

Streaming templates allow a document to be processed hierarchically in the classical XSLT style, applying template rules to each element (or other nodes) in a top-down manner, while scanning the source document in a pure streaming fashion, without building the source tree in memory. Saxon-EE allows streamed processing of a document using template rules, provided the templates conform to a set of strict guidelines.

Streaming in this way is a property of a mode; a mode can be declared to be streamable, and if it is so declared, then all template rules using that mode must obey the rules for streamability. A mode is declared to be streamable using the top-level stylesheet declaration:

<xsl:mode name="s" streamable="yes"/>

The name attribute is optional; if omitted, the declaration applies to the default (unnamed) mode.

Streamed processing of a source document can be applied either to the principal source document of the transformation, or to a secondary source document read using the xsl:source-document instruction.

To use streaming on the principal source document, the input to the transformation must be supplied in the form of a StreamSource or SAXSource, and the initial mode selected on entry to the transformation must be a streamable mode. In this case there must be no references to the context item in the initializer of any global variable.

Streamed processing of a secondary document is initiated using the instruction:

<xsl:source-document streamable="yes" href="abc.xml"> <xsl:apply-templates mode="s"/> </xsl:source-document>

Saxon will also recognize an instruction of the form:

<xsl:apply-templates select="doc('abc.xml')" mode="s"/>

Here the select attribute must contain a simple call on the doc() or document() function, and the mode (explicit or implicit) must be declared as streamable. The call on doc() or document() can be extended with a streamable selection path, for example select="doc('employee.xml')/*/employee".

If a mode is declared as streamable, then it must ONLY be used in streaming mode; it is not possible to apply templates using a streaming mode if the selected nodes are ordinary non-streamed nodes.

Every template rule within a streamable mode must follow strict rules to ensure it can be processed in a streaming manner. The essence of these rules is:

  1. The match pattern for the template rule must be a simple pattern that can be evaluated when positioned at the start tag of an element, without repositioning the stream (but information about the ancestors of the element and their attributes is available, together with some limited information about their position relative to their siblings). Examples of acceptable patterns are *, para, para[1], or para/*.

    If the match pattern includes a boolean predicate, then the predicate must be "motionless", which means that it can be evaluated while the input stream is positioned at the start tag. This means it can reference properties such as name() and base-uri(), and can reference attributes of the element, but cannot reference its children or content.

    If the match pattern includes a numeric predicate, then it must be possible to evaluate this by counting either the total number of preceding-sibling elements, or the number of preceding siblings with a given name. Examples of permitted patterns include *[1], p[3], and *:p[2][@class='bold']; disallowed patterns include (descendant::fig)[1], p[@class='bold'][2], and p[last()].

  2. The body of the template rule must contain at most one expression or instruction that reads the contents below the matched element (that is, children or descendants), and it must process the contents in document order. This expression or instruction will often be one of the following:

    • <xsl:apply-templates/>

    • <xsl:value-of select="."/>

    • <xsl:copy-of select="."/>

    • string(.)

    • data(.) (explicitly or implicitly)

    but this list is not exhaustive. It is possible to process the contents selectively by using a streamable path expression, for example:

    • <xsl:apply-templates select="foo"/>

    • <xsl:value-of select="a/b/c"/>

    • <xsl:copy-of select="x/y"/>

    but this effectively means that the content not selected by this path is skipped entirely; the transformation ignores it.

    The template can access attributes of the context item without restriction, as well as properties such as its name(), local-name(), and base-uri(). It can also access the ancestors of the context item, the attributes of the ancestors, and properties such as the name of an ancestor; but having navigated to an ancestor, it cannot then navigate downwards or sideways, since the siblings and the other descendants of the ancestor are not available while streaming.

    The restriction that only one downwards access is allowed makes it an error to use an expression such as price - discount in a streamable template. This problem can often be circumvented by making a copy of the context item. This can be done using the copy-of() function: for example <xsl:value-of select="copy-of(.)/(price - discount)"/>. Taking a copy of the context node requires memory, of course, and should be avoided unless the contents of the node are small.

    Certain constructs using positional filters can be evaluated in streaming mode. For example, it is possible to use <xsl:apply-templates select="*[1]"/>. The filter must be on a node test that uses the child axis and selects element nodes. The forms accepted are expressions that can be expressed as x[position() op N] where N is an expression that is independent of the focus and is statically known to evaluate to a number, x is a node test using the child axis, and op is one of the operators eq, le, lt, gt, or ge. Alternative forms of this construct such as x[N], remove(x, 1), head(x), tail(x), and subsequence(x, 1, N) are also accepted.