Burst-mode streaming

Burst-mode streaming takes a streamed document as input, and generates a sequence of small subtrees containing the parts of the document that need to be processed. This can be achieved using XSLT 3.0 syntax like this:

<xsl:stream href='employees.xml'> <xsl:apply-templates select="copy-of(*/employee)"/> </xsl:stream>

The code that processes an individual employee element does not need to be streamable; it can use any XSLT constructs. The only constraint is that it cannot navigate outside the employee element: because the employee element is a copy of a subtree from the orginal document, it has no parent or siblings.

The same effect can be achieved in XQuery using the saxon:stream extension function, which enables burst-mode streaming by reading a source document and delivering a sequence of element nodes representing selected elements within that document. For example:

for $e in saxon:stream(doc('employees.xml')/*/employee) return <sal>{$e/salary}</sal>

This example returns a sequence of sal elements. The result of the saxon:stream call is a sequence of parentless employee elements. This means it is not possible to navigate from one employee element to others in the file; in fact, only one of them actually exists in memory at any one time.

The function saxon:stream may be regarded as a pseudo-function. Conceptually, it takes the set of nodes supplied in its argument, and makes a deep copy of each one (the copy operation is needed to make the employee elements parentless). The resulting sequence of nodes will usually be processed by an expression such as an XQuery FLWOR expression, which handles the nodes one at a time. The actual implementation of saxon:stream, however, is rather different, in that it changes the way in which its argument is evaluated: instead of the doc() function building a tree in the normal way, the path expression doc('employees.xml')/*/employee) is evaluated in streamed mode - which means that it must conform to a subset of the XPath syntax which Saxon can evaluate in streamed mode. For details of this subset, see Streamable path expressions

The facility should not be used if the source document is read more than once in the course of the query/transformation. There are two reasons for this: firstly, if it read more than once then performance will be better if the document is read into memory; and secondly, when this optimization is used, there is no guarantee that the doc() function will be stable, that is, that it will return the same results when called repeatedly with the same URI.

If the path expression cannot be evaluated in streaming mode, execution does not fail; rather it is evaluated with an unoptimized copy-of instruction. This will give the same results provided enough memory is available for this mode of evaluation. To check whether streamed processing is actually being used, set the -t option from the command line or the FeatureKeys.TIMING option from the configuration API; the output will indicate whether a particular source document has been processed by building a tree, or by streaming.

In XSLT another way of invoking the facility (retained from earlier Saxon releases) is by using an <xsl:copy-of> instruction with the special attribute saxon:read-once="yes". Typically the xsl:copy-of instruction will form the body of a stylesheet function, which can then be called in the same way as saxon:stream to deliver the stream of records. This approach has the advantage that the code is portable to other XSLT processors (saxon:read-once="yes" is an extension attribute, a processing hint that other XSLT processors are required to ignore.)

In XQuery the same effect can be achieved using a pragma (# saxon:read-once #). Again, processors other than Saxon are required to ignore this pragma.

Example: selective copying

A very simple way of using this technique is when making a selective copy of parts of a document. For example, the following code creates an output document containing all the footnote elements from the source document that have the attribute @type='endnote':

XSLT example

<xsl:template name="main"> <footnotes> <xsl:stream href="thesis.xml"> <xsl:copy-of select=".//footnote[@type='endnote'])"/> </xsl:stream> </footnotes> </xsl:template>

XQuery example

<footnotes>{ saxon:stream(doc('thesis.xml')//footnote[@type='endnote']) }</footnotes>

XSLT example using xsl:copy-of

To allow code to be written in a way that will still work with processors other than Saxon, the facility can also be invoked using extension attributes in XSLT. Using this syntax, the previous example can be written as:

XSLT example

<xsl:template name="main"> <footnotes> <xsl:copy-of select="doc('thesis.xml')//footnote[@type='endnote']" saxon:read-once="yes" xmlns:saxon="http://saxon.sf.net/"/> </footnotes> </xsl:template>

XQuery example using the saxon:stream pragma

In XQuery the pragma saxon:stream is available as an alternative to the function of the same name, allowing the code to be kept portable. The above example can be written:

<footnotes>{ (# saxon:stream #) { doc('thesis.xml')//footnote[@type='endnote'] } }</footnotes>

Note the restrictions below on the kind of predicate that may be used.