Serialization parameters

Saxon provides a number of additional serialization parameters, with names in the Saxon namespace. These can be specified as attributes on the xsl:output and xsl:result-document elements (XSLT-only), in the Query prolog (XQuery only), as parameters in the fn:serialize() function, or as extra parameters on the Query or Transform command line. They can also be specified in the query or transformation API.

saxon:attribute-order

eqnames

Available with the XML, HTML, and XHTML output methods, to control the order in which attributes appear within an element start tag (in the absence of the property the order of attributes is unpredictable). The value of the parameter is a list of tokens, each of which is either a QName or the token "*" to match unspecified attributes. Attributes whose names are listed before the "*" token appear first, in the order they are listed; other unlisted attributes follow, sorted first by namespace URI and then by local-name, and finally any attributes whose names appear after the "*" appear at the end. For example saxon:attribute-order="a b c * xml:space" will cause attributes to be output in the order a, then b, then c, then everything else (sorted by URI and local name), then xml:space.

saxon:canonical

boolean

Available with the XML output method, to request that the serialized XML conforms to the W3C XML Canonicalization 1.1 specification (C14N). This can be useful, for example, when test results are to be compared. Specifically, this option changes XML serialization as follows:

  1. Empty elements are output as <empty></empty> rather than <empty/>.
  2. Namespaces within a start tag are sorted in alphabetical order of prefix.
  3. Attributes within a start tag are sorted first by namespace URI, then by local name.
  4. Processing instructions and comments that appear as children of the document node are separated by newlines.

Specifying saxon:canonical="yes" forces omit-xml-declaration="yes", indent="no", and encoding="utf-8", and causes use-character-maps and cdata-section-elements to be ignored. No DOCTYPE declaration is output. The option does not force Unicode normalization; if in doubt, set normalization-form="C".

saxon:character-representation

"native" | "entity" | "decimal" | "hex"

Allows greater control over how non-ASCII characters will be represented on output.

When the output method is XML, two values are supported: decimal and hex. These control whether numeric character references are output in decimal or hexadecimal when the character is not available in the selected encoding.

When the output method is HTML, the value may hold two strings, separated by a semicolon. The first string defines how non-ASCII characters within the character encoding will be represented, the values being native, entity, decimal, or hex. The second string defines how characters outside the encoding will be represented, the values being entity, decimal, or hex. Here native means output the character as itself; entity means use a defined entity reference (such as "&eacute;") if known; decimal and hex refer to numeric character references. For example entity;decimal (the default) means that with encoding="iso-8859-1", characters in the range 160-255 will be represented using standard HTML entity references, while Unicode characters above 255 will be represented as decimal character references.

saxon:double-space

eqnames

When the output method is XML with indent="yes", the saxon:double-space attribute may be used to generate an extra blank line before selected elements. The value is a whitespace-separated list of element names. The attribute follows the same conventions as cdata-section-elements: values specified in separate xsl:output or xsl:result-document elements are cumulative, and if the value is supplied programmatically via an API, or from the command line, then the element names are given in Clark notation, namely {uri}local. The effect of the attribute is to cause an extra blank line to be output before the start tag of the specified elements.

saxon:indent-spaces

integer

When the output method is XML, HTML, or XHTML with indent="yes", the saxon:indent-spaces attribute may be used to control the amount of indentation. The default value in the absence of this attribute is 3.

saxon:internal-dtd-subset

integer

When the output method is XML, the saxon:internal-dtd-subset attribute may be used to generate an internal DTD.

The value is a string conforming to the XML grammar production intSubset; it is included in the serialized document "as is", without checking. As with any string, special characters will need to be escaped, for example "<" is written as "&lt;". The square brackets that enclose the internal subset within the Document Type Declaration should not be included in the value.

saxon:line-length

integer

Default value 80. With the XML output method, attributes are output on a new line if they would otherwise extend beyond this column position. With the HTML output method, text lines are split at this line length when possible.

saxon:newline

string

Default value 10. Defines the string that is used by the text output method to represent a newline. The Windows line ending x0Dx0A (CRLF) may sometimes be preferred to the default of a single newline character, this can be specified using saxon:newline="&#x0D;&#x0A;".

saxon:next-in-chain

uri

XSLT only. Used to direct the output to another stylesheet. The value is the URL of a stylesheet that should be used to process the output stream. In this case the output stream must always be pure XML, and attributes that control the format of the output (e.g. method, cdata-section-elements, etc) will have no effect. The output of the second stylesheet will be directed to the destination that would have been used for the first stylesheet if no saxon:next-in-chain attribute were present.

This serialization property is available only on xsl:output declarations and xsl:result-document instructions. It cannot be supplied as an attribute to any of the various APIs that control serialization; nor can it be used on the command line. It is not supported as an XQuery serialization parameter.

Supplying a zero-length string is equivalent to omitting the attribute, except that it can be used to override a previous setting.

If the value is a relative URI, it is interpreted relative to the base URI of the stylesheet element (xsl:output or xsl:result-document) on which the attribute appears.

saxon:property-order

eqnames

Available with the JSON output method, to control the order in which properties appear within the serialization of a map/object (in the absence of saxon:property-order the order of properties is unpredictable). The value of the parameter is a list of tokens, in which the token "*" is treated specially. Properties whose names are listed before the "*" token appear first, in the order they are listed; other unlisted properties follow, sorted alphabetically, and finally any properties whose names are listed after the "*" appear at the end. For example saxon:property-order="@ a b c * $" will cause properties to be output in the order @, then a, then b, then c, then everything else, then $. Although JSON property names can include spaces, there is no provision for such names to be included in the list.

saxon:recognize-binary

boolean

Relevant only when using the text output method. If set to yes, the processing instructions <?hex XXXX?> and <?b64 XXXX?> will be recognized; the value is taken as a hexBinary or base64 representation of a character string, encoded using the encoding in use by the serializer, and this character string will be output without validating it to ensure it contains valid XML characters. Also recognized are <?hex.EEEE XXXX?> and <?b64.EEEE XXXX?>, where EEEE is the name of the encoding of the base64 or hexBinary data: for example hex.ascii or b64.utf8.

This enables non-XML characters, notably binary zero, to be output.

For example, given <xsl:output method="text" saxon:recognize-binary="yes"/>, the following instruction:

<xsl:processing-instruction name="hex.ascii" select="'00'"/>

outputs the Unicode character with codepoint zero ("NUL"), while

<xsl:processing-instruction name="b64.utf8" select="securityKey"/>

outputs the value of the securityKey element, on the assumption that this is base64-encoded UTF-8 text.

saxon:require-well-formed

boolean

Affects the handling of result documents that contain multiple top-level elements or top-level text nodes. The W3C specifications allow such a result document, even though it is not a well-formed XML document. It is, however, a well-formed external general parsed entity, which means it can be incorporated into a well-formed XML document by means of an entity reference.

The default is no. If the value is set to yes, and a SAX destination (for example a SAXResult, a JDOMResult, or a user-written ContentHandler) is supplied to receive the results of the transformation, then Saxon will report an error rather than sending a non-well-formed stream of SAX events to the ContentHandler. This attribute is useful when the output of the stylesheet is sent to a component (for example an XSL-FO rendering engine) that is not designed to accept non-well-formed XML result trees.

Note also that namespace undeclarations of the form xmlns:p="" (as permitted by XML Namespaces 1.1) are passed to the startPrefixMapping() method of a user-defined ContentHandler only if undeclare-prefixes="yes" is specified on xsl:output.

saxon:single-quotes

boolean

If set to yes, the XML, HTML, and XHTML output methods will generally use single quotes (apostrophes) rather than double quotes to delimit attribute values.

This can be useful if the serialized XML/HTML is to be subsequently wrapped in double quotes, for example as part of a JSON text, or within a Java string literal. It does not eliminate the need to escape double quotes (using \") in such a context, but it means that fewer characters will be affected, which improves the readability of the result.

The property is ignored in the case of attributes affected by character maps, where single or double quotes are used intelligently based on the actual content of the attribute.

saxon:supply-source-locator

boolean

Relevant only when output is sent to a user-written ContentHandler, that is, a SAXResult. It causes extra information to be maintained and made available to the ContentHandler for diagnostic purposes: specifically, the Locator that is passed to the ContentHandler via the setDocumentLocator method may be cast to a ContentHandlerProxyLocator, which exposes the method getContextItemStack(). This returns a java.util.Stack. The top item on the stack is the current context item, and below this are previous context items. Each item is represented by the interface net.sf.saxon.om.Item. If the item is a node, and if the node is one derived by parsing a source document with the line-numbering option enabled, then it is possible to obtain the URI and line number of this node in the original XML source.

For this to work, the code must be compiled with tracing enabled. This can be achieved by setting the option config.setCompileWithTracing(true) on the Configuration object, or equivalently by setting the configuration property COMPILE_WITH_TRACING. Note that this compile-time option imposes a substantial run-time overhead, even if tracing is not switched on at run-time by providing a TraceListener.