Collation

Collations used for comparing strings can be specified by means of a URI. A collation URI may be used as an argument to many of the standard functions, and also as an attribute of xsl:sort in XSLT, and in the order by clause of a FLWOR expression in XQuery.

Saxon provides a range of mechanisms for binding collation URIs. The language specifications simply say that collations used in sorting and in string-comparison functions are identified by a URI, and leaves it up to the implementation how these URIs are defined.

There is one predefined collation that cannot be changed. This is the Unicode Codepoint Collation defined in the W3C specifications (see http://www.w3.org/2005/xpath-functions/collation/codepoint). This collates strings based on the integer values assigned by Unicode to each character, for example "ah!" sorts before "ah?" because the Unicode codepoints for "ah!" are (97, 104, 33) while the codepoints for "ah?" are (97, 104, 63).

You can use the Saxon configuration file to define collations: see The collations element.

There are two means of customising collations: either by defining a specific sequence, or parametrically configuring the Unicode Collation Algorithm (fully supported in Saxon-PE/EE only). Details of these are given in the following pages: