Serialization

The HTML serializer now uses native representation for all characters that are present in the chosen encoding, as required by the latest serialization spec. To retain the previous behaviour, set <xsl:output saxon:character-representation="entity;decimal" xmlns:saxon="http://saxon.sf.net/"/> in the stylesheet.

The HTML serializer now reports an error if any text or attribute node contains a character in the range #7F to #9F. (The most likely explanation for this error is that the stylesheet or source document is encoded using the Windows code page CP-1252 and is wrongly declared as being encoded in ISO-8859-1. The answer in such cases is to correct the encoding attribute in the XML declaration.)

The HTML serializer now reports an error if any processing instruction contains a ">" character.

The byte-order-mark serialization property is now ignored except when writing in UTF-8 encoding (in UTF-16, a byte order mark is always written; in other encodings, it is never written).

When a new <meta> element is added by the HTML or XHTML output methods, existing <meta> elements with the attribute http-equiv="Content-Type" are deleted. This has been implemented by adding a new processor, the MetaTagAdjuster, to the serialization pipeline. There are a few minor side-effects of this change. No newline is now added after an inserted <meta> element, unless one is added by the indentation process. The <meta> element is added only after the first <head> element in the document; previously it was added after every <head> element.

The XHTML serializer now uses the same indenting rules as the HTML serializer, except that the elements recognized as inline elements or formatting elements must have names that are in lowercase, and must be in the XHTML namespace. Previously XHTML was indented using the same indenting rules as the XML serializer.

The XHTML serializer now escapes URI attribute values.

The normalization-form attribute on xsl:output and xsl:result-document is now supported. The permitted values are NFC, NFD, NFKC, NFKD, and "none". The "fully-normalized" option is not implemented. Normalization can also be requested using any of the APIs that define serialization parameters, through the new parameter key SaxonOutputKeys.NORMALIZATION_FORM. Unicode normalization is available with all output methods. It is likely to be expensive, as the code is not highly optimized, so it should be used only when actually necessary.

When indent="yes" is specified, the XML serializer now starts new attributes (and namespace declarations) on a new line, suitably indented, starting with the first attribute that would take the cumulative length of attributes over 80 characters. This is designed primarily to improve readability for elements containing many namespace declarations.

Serialization errors are now identified by an 8-character error code, in the same way as errors from other parts of the system. This anticipates changes in the W3C specification.