Compiling a Stylesheet

Generally, the cost of analyzing the XSLT source code in a stylesheet and preparing it for execution can be high in relation to the cost of actually running the code to transform an individual source document, especially where the stylesheet is large and the source document is small. Saxon provides several capabilities designed to ensure that when you use the same stylesheet repeatedly, you only need to incur this overhead once.

Caching compiled stylesheets in memory

The JAXP interface represents a compiled stylesheet as a Templates object. The object contains the entire stylesheet; all modules must be compiled as a single unit. JAXP was designed before packages were added to the XSLT 3.0 language. The Templates object is thread-safe, so once created it can be used by many transformations running separately in parallel. To use the Templates object to run a transformation of a particular source document, a Transformer object is created. The Transformer is not thread-safe; its transform() method must not be called while a transformation is active. The Transformer can be serially reused, but with Saxon there is no benefit in doing so; better garbage collection generally occurs if a new Transformer is created for each transformation.

The s9api interface (Java) and Saxon.Api (C#) have a similar design: a compiled stylesheet is represented by a XsltExecutable object, and the instantiation of a stylesheet performing a single transformation by an XsltTransformer or Xslt30Transformerobject. These APIs also add a third class to the design, namely the XsltCompiler, which holds compile-time options such as the base URI of the stylesheet, values of static parameters, and compile-time options such as whether to generate bytecode, how to resolve references to modules (xsl:include/xsl:import), what schema definitions to use, and where to report compile-time errors. The XsltCompiler is also thread-safe, though the options in force should not be changed while the compiler is in use, and you may need to think carefully about how to capture compilation errors if several compilations are active at the same time. Different XsltCompiler instances with different option settings can run concurrently with each other.

XSLT 3.0 packages have been supported since Saxon 9.6. A package may consist of a single module, or of a number of modules connected using xsl:include/xsl:import; a package is compiled as a unit, and may have references to other packages (via xsl:use-package) that are compiled independently. To allow independent compilation, there is much stronger control over the interfaces that a package exposes to the outside world, and over the ability of declarations in one package to override another. For example, if a function is declared to return an integer, then when compiling a call to that function, the compiler can be confident that any overriding declaration of the function will still return an integer result.

In the s9api interface (Java), a package is represented by an XsltPackage object. The XsltCompiler has a method compilePackage which returns an XsltPackage if successful. The package may be made available for use by other packages being compiled, in the same or in a different XsltCompiler, by the XsltCompiler's importPackage method. When an xsl:use-package declaration is found while compiling one package, the compiler searches for a matching package among those that have been imported by the XsltCompiler in this way. It is possible to import several different versions of the same package, and the package-version attribute of xsl:use-package determines which of them is loaded.

In the Saxon.Api interface (C#), a package is represented by an XsltPackage object. The XsltCompiler has a method CompilePackage which returns an XsltPackage if successful. The package may be made available for use by other packages being compiled, in the same or in a different XsltCompiler, by the XsltCompiler's ImportPackage method. When an xsl:use-package declaration is found while compiling one package, the compiler searches for a matching package among those that have been imported by the XsltCompiler in this way. It is possible to import several different versions of the same package, and the package-version attribute of xsl:use-package determines which of them is loaded.

The XsltPackage object, once created, is immutable and thread-safe. It is tied to a Saxon Processor, but it can be imported by multiple XsltCompiler instances. If a common library package is used by many different stylesheets, it makes sense to define it as a reusable package, since this avoids the cost of compiling the code repeatedly, and avoids the need to keep multiple copies in memory.

JIT Compilation of Template Rules

Sometimes a stylesheet may contain hundreds of template rules to define the processing of elements that are rarely used in source documents; many source documents might use a tiny fraction of the defined vocabulary. In this situation, it is wasteful to compile all these template rules every time the stylesheet is used. This isn't a problem when the stylesheet is compiled once, cached, and used to run a large number of transformations; but it is a problem in a batch workflow where the stylesheet is compiled every time it is used.

To improve the efficiency of this kind of workload, Saxon-EE by default uses just-in-time compilation of template rules. On first reading the stylesheet, all the match patterns are processed and a suitable decision table is constructed; but the body of a template rule is not compiled into executable form until the first time that template rule is matched.

A consequence of this is that static errors (for example, invalid path expressions) in such templates may go undetected if the code is not actually executed.

JIT compilation is enabled by default. It can be suppressed from the command line by setting -jit:off. Setting the export, explain, or nogo options also has the side-effect of suppressing JIT compilation. There is also an option available on the XsltCompiler object.

It probably makes sense to suppress JIT compilation in any workload where the compiled stylesheet is cached and used repeatedly.

Exporting Packages

A package, once compiled into an XsltPackage object, can be saved as a stylesheet export file (SEF) using the save() (Java) or Save() (C#) method of the XsltPackage. The generated file is intended to be used for one purpose only, namely for reconstituting the XsltPackage at a different time and place. The format is XML, but its interpretation is not published and should not be considered stable. The file contains a checksum and cannot be loaded in the event of a checksum failure, so modifications to the content are not permitted. The content of the file is sufficiently far removed from the original source that distributing code in this form achieves a useful level of IP protection, though like Java bytecode, it is not intended to resist determined attempts at reverse engineering. Indeed, in the interests of run-time diagnostics, it preserves information such as variable names and line numbers that are not strictly needed at execution time.

The simplest way to generate an export file is from the command line, for example with Saxon-EE 11.0:

java -jar dir/saxon-ee-11.0.jar -xsl:stylesheet.xsl -export:stylesheet.sef -nogodotnet SaxonCS transform -xsl:stylesheet.xsl -export:stylesheet.sef -nogo

Here, the option -nogo suppresses any attempt to execute the stylesheet.

Additionally, the -relocate:on option can be used to produce an export package which can be deployed to a different location, with a different base URI.

The -target option can be used to specify the edition of Saxon which will be used to run the stylesheet export file. The accepted values are EE|PE|HE|JS, and the default is EE. For instance, specify -target:HE to produce an export file which can be executed by Saxon-HE (this will suppress the generation of optimized constructs that SaxonJ-HE cannot execute). The option -target:JS is used when generating stylesheets to be executed by Saxon-JS (in the browser, or on Node.js); in this case the SEF file is in JSON format rather than XML.

A stylesheet export file for a complete stylesheet (as distinct from a library package) is accepted by any Saxon interface that accepts a source stylesheet. For example, from the command line:

java -jar dir/saxon-ee-10.0.jar -xsl:stylesheet.sef -s:source.xmldotnet SaxonCS transform -xsl:stylesheet.sef -s:source.xml

When exporting a package, all components (templates, functions, etc) from the packages it uses are also exported. It is possible therefore either to export an individual library package (typically having no dependencies on other packages), or a complete stylesheet (a package together with its tree of dependencies).

Packages that are used by a stylesheet can be identified in a number of ways. They can be listed on the command line in the -pack option, or imported to theXsltCompiler API using the methods loadLibraryPackage and loadExecutablePackage (Java), or LoadLibraryPackage and LoadExecutablePackage (C#). Another option, probably the best option if many library packages are used, is to list them in a configuration file.

In the case of schema-aware stylesheets, the schema components needed by a stylesheet are not exported along with the stylesheet code. The user of the stylesheet needs to import the required schemas before the stylesheets can be loaded. The schema loaded at execution time must match the schema used when the stylesheet was compiled. Saxon is not draconian about checking this, and many minor changes will cause no trouble (for example, changing the regular expression used in a pattern facet). Structural changes that invalidate the assumptions made during XSLT compilation, however, are likely to cause execution to fail, not necessarily in predictable ways.

The computer on which the stylesheet is executed needs to have a Saxon license of sufficient capability to meet the requirements of the stylesheet. There are two ways this can be achieved. Either the run-time system can have a conventional Saxon license installed in the normal way, or it can take advantage of a license embedded within the exported stylesheet itself. Saxonica offers developers the option of purchasing a "developer master key" which, if installed, will cause all exported stylesheets to contain an embedded license key sufficient to execute the stylesheet in question. An embedded license key applies only to that stylesheet and cannot be used for any other code developed elsewhere; stylesheets that are exported with an embedded license can only be executed "as is", and cannot be incorporated as libraries into larger applications.

Exporting stylesheet packages requires Saxon-EE, optionally with the Developer Master Key if stylesheets with embedded license information are to be exported. From Saxon 9.9, importing stylesheet packages is possible using any Saxon edition, provided that the run-time software and the run-time license key (where needed) support the features used by the stylesheet in question.

There are a small number of cases where a valid stylesheet cannot be exported; but they are rarely encountered in practice. Many of the restrictions relate to static global variables (or parameters). A stylesheet cannot be exported if it contains static global variables that are bound to:

Where a stylesheet being exported contains static variables bound to nodes, the nodes will be reconstructed on import. The reconstructed nodes will be parentless, and will lose their inter-node relationships. For example if the value of such a variable contains a sequence of ten elements that are siblings of each other, the reconstructed element nodes will not be siblings of each other (they will be parentless).

Bytecode generation

When a stylesheet package is compiled into its in-memory representation, SaxonJ-EE by default generates Java bytecode for faster execution of selected parts of the code. The generated bytecode is mixed with interpreted code, each calling the other where appropriate.

From Saxon 9.8, bytecode generation is by default applied only to hotspots, that is, parts of the executable code that are found to be frequently executed. These will often be predicates in filter expressions. The threshold for generating bytecode is configurable. Bytecode generation can be monitored using the -TB option on the command line.

The performance boost achieved by bytecode generation is variable; 25% is typical. The functions and templates that benefit the most are those where the expression tree contains many constructs that are relatively cheap in themselves, such as type conversion, comparisons, and arithmetic. This is because the saving from bytecode generation is mainly not in the cost of performing primitive operations, but in the cost of deciding which operations to perform: so the saving is greater where the number of operations is high relative to their average cost.

There are configuration options to suppress bytecode generation (GENERATE_BYTE_CODE), to insert debugging logic into the generated bytecode (DEBUG_BYTE_CODE), and to display the generated bytecode (DISPLAY_BYTE_CODE).

Currently, exported packages do not include bytecode.