Saxonica: XSLT and XQuery

Introduction to XSLT and XQuery

Saxonica specializes in XSLT and XQuery technology. If you're new to these languages, this page tries to explain briefly what these languages are and why you would want to use them.

XSLT: styling and transformation

XSLT stands for eXtensible Stylesheet Language - Transformation. XSLT transforms (or changes) an XML document according to instructions specified in an XSLT stylesheet so that a single source or set of source documents can be used to produce output in a number of possible formats, for example:

  • HTML which can then be rendered by a browser
  • a database load file
  • data formatted for a typesetting driver such as postscript or PDF
  • other, different XML documents (e.g. you're a publisher who wants to publish at the British Library - you need to use their tag set)
  • flat text (ASCII) or CSV, to import into a piece of software.

The language was designed primarily for writing stylesheets that enable XML documents to be displayed to their readers. By writing more than one stylesheet, you can display the same information in different ways to different audiences, and tailor the presentation to the capabilities of different display devices, including web browsers, conventional print media, handheld devices, and digital TV.

By authoring your information in XML, and rendering it using XSLT, you separate the tasks of the content author and the graphic designer, and you ensure that a uniform design is achieved across all your content. This approach also allows you to change the visual design without rewriting the content.

XSLT is also used increasingly for transforming data. For example, if you need to send electronic orders to your suppliers, then the XML message used for this purpose will have to conform to a schema agreed with the suppliers. If the data starts life in an Excel spreadsheet, you can export it in XML using the built-in capabilities of Excel, and then transform it to the format required by the agreed schema, by means of an XSLT transformation.

XSLT 1.0 was defined by the World Wide Web Consortium (W3C) in 1999. Saxon was one of the first implementations. Many others followed, including products from major vendors such as Microsoft, Oracle, and IBM. Despite this competition, the Saxon XSLT 1.0 processor was downloaded nearly 200,000 times, as well as being bundled with many other software products. It featured regularly in the list of the top 100 open-source software products on the SourceForge site.

XSLT 2.0 followed in 2007, and Saxon led the way in implementing the standard, which offers immense improvements in functionality and productivity over the original version. Many of the XSLT 1.0 implementations (for example, those from Microsoft) dropped out of the race, confirming Saxon's leadership position. In 2012 Saxonica produced Saxon-CE, which was not only the first-ever implementation of XSLT 2.0 able to run in the browser, but also the first to handle user interaction as well as pure batch transformation.

Return to top of page

XSLT 2.0 and schema-aware transformation

The specifications for XSLT 2.0 reached the status of a W3C Recommendation in January 2007, after a lengthy development process. Throughout this process, regular new Saxon releases were produced implementing the new features of successive drafts of the specification. This gave the user community the opportunity to try out new features before they were frozen, and to give feedback to the working groups developing the language. As editor of the specification and developer of Saxon, Michael Kay was especially well-placed to ensure that the users' voice was heard.

XSLT 2.0 added many new features desperately needed by existing users. These include more powerful facilities for handling text (regular expressions) and structured data (grouping), which greatly increase the range of tasks to which XSLT can be applied, especially when converting legacy data and document formats to XML.

One of the most significant additions was that XSLT became schema-aware. This means that the XML Schema used to define the source and result documents of a transformation can now be used to guide the compilation and execution of the stylesheet. This makes stylesheet code more robust, it speeds the debugging cycle, and it creates the potential for significant performance improvements.

Saxon was the first processor to implement the XSLT 2.0 specification and remained in this position for some time. Today it has been joined by commercial implementations from vendors such as IBM, Intel, Altova, and MarkLogic, but it remains the most mature of the available products, valued by users for its performance, its usability, its obsession with standards conformance, and its useful range of vendor extensions. Saxon is now available in three versions: an open source product, Saxon-HE, and an entry-level commercial product Saxon-PE which both implement the conformance requirements for a Basic XSLT Processor, and the enterprise product, Saxon-EE, which adds the extra features of a Schema-Aware XSLT Processor.

Since XSLT 2.0 was finalized in 2007, the W3C Working Group has turned its attention to the development of XSLT 3.0, and once again Saxon is in the lead implementing new features as they emerge. The focus in XSLT 3.0 is on streaming transformations (transforming large documents without first reading them into memory in their entirety), and many of the new features to enable this have been productized in the 9.x series of Saxon-EE releases.

Return to top of page

XQuery: the query language for XML

XQuery is another language specification from W3C, produced in parallel with XSLT 2.0. XQuery is a query language for XML documents. Whereas XSLT was designed primarily for document rendition, and does data transformation almost as a sideline, XQuery was conceived as a language for querying XML databases, in the same way as SQL is used for querying relational databases.

It can be used to extract information from one or more XML documents, for example:

  • Show all the books published by National Geographic
  • What is the average price of books with "Harry Potter" in the title?
  • Sort the books by author within each category.

All the major relational database vendors are adding XML support to their existing products, and new Native XML Databases are also appearing. These products all use XQuery to access the data.

Like XSLT, XQuery can also be used to transform XML messages from one format to another. The language is less powerful than XSLT 2.0, but usability studies have shown that it is easier for users to learn, and there are also indications that it is easier for vendors to optimize.

XQuery 3.0 is now a W3C Recommendation, and Saxon is achieving a 100% pass rate in the evolving test suite (containing more than 25,000 separate tests).

Return to top of page

XPath

XQuery and XSLT have much in common. Both languages make use of XPath, a syntax for finding your way around the structure of an XML document and it can also be used directly from programming languages such as Java and C#. It is also used within other W3C languages such as XML Schema and XForms. Both languages share the same data model and type system, and the same function library, which means that the two languages can work together well in a single application. For example, you can use XQuery to extract data from an XML database, and XSLT to present the results to users on the web.

Return to top of page