XQuery 1.0 implementation

The prolog option declare copy-namespaces no-preserve is now implemented.

The encoding specified in the query prolog is now recognized. Saxon first attempts to determine the encoding from any HTTP header (relevant only if the query is referenced via an HTTP URL). It then tries to read a byte-order-mark: if present this will identify the file as UTF-16 (big-endian or little-endian) or UTF-8. Failing this, it tries to read the encoding declaration. This works only if the actual encoding is a superset of ASCII. If no encoding declaration is present, UTF-8 is assumed. Note that Saxon previously defaulted to using the Java default encoding for the platform, typically ISO-8859-1 or Windows CP1252 (depending on the platform and the country). Queries that contain non-ASCII characters may therefore need to have an encoding declaration added.

Line endings in a query are now normalized to x0A, using the XML 1.1 rules, wherever they occur. Previously line endings were normalized only in node constructor content. The specification in this area has changed. Also, Saxon did not implement the previous specification as written: for example Saxon 8.4 normalized line endings in comments (which the specification did not require) while failing to normalize them in string literals (which the specification did require).

Whitespace characters such as tab and newline appearing in attribute content are now normalized to x20. For example, a newline in attribute content within a direct element constructor is treated as a single space. This occurs after the newline normalization mentioned above.

XQuery now uses the XML serialization method unless a different method is explicitly requested. Previously it followed XSLT by using the HTML serialization method if the first element output was an HTML element. Generally, the default serialization parameters now match those in Appendix C.2 of the XQuery specification, with one exception: when running from the command line, the default is indent=yes which produces indented output. This can be changed by use of the command-line option !indent=no.

When an xs:QName is supplied as the node name in a computed element constructor or computed attribute constructor, the prefix part of the xs:QName is now retained in the name of the generated node, where possible. This will always be possible for element nodes, but for attribute nodes, a different prefix will be generated in two situations: (a) if there is a conflict among the prefixes allocated to different attributes, or (b) if the attribute is in a namespace but the xs:QName value contains no prefix.

The construct text{()} now returns an empty sequence rather than a text node containing a zero-length string.

When a query is run from the command line, or using the run() method on the XQueryExpression object, the output is now piped into the serializer without materializing the result tree in memory. This was previously done only if the outermost level of the query expression was an element constructor. This change can result in a substantial reduction in the memory used to transform large documents, and also in the execution speed of such transformations.

Minor changes have been made to the way that namespace prefixes are allocated to constructed elements and attributes in cases where a prefix needs to be invented.