Character Encodings Supported
The encodings supported on input depend entirely on your choice of XML parser.
On output, any encoding supported by the Java VM or the .NET platform (as appropriate) may be used.
iso646 (in any mixture of upper and lower
case) are recognized as synonyms of
On the Java platform, there are some differences between the character encodings supported by the old
and the new
java.nio package. If the requested encoding is not supported by the
java.nio package, then
all non-ASCII characters will be represented using numeric character references. If the encoding is
not supported by the
java.io package, then Saxon will revert to using UTF-8 as the actual output
A list of the character encodings
supported in the
java.nio package can be obtained by using the command
with no parameters. Java does not provide any means of determining the list of encodings
supported by the
On output, character encoding is a two stage process. Saxon itself has to decide whether a particular character
is supported by the chosen encoding. If not, it converts the character to a numeric character reference if it appears
in a context where this would be valid; otherwise (for example it it appears in an element name) it reports an error. Then
the character has to be converted to the appropriate sequence of bytes: this second stage is delegated to the Java VM.
For the first stage, Saxon handles certain encodings itself,
because this is more efficient and more reliable. If an encoding is used that is known to Java but not known
to Saxon, Saxon attempts to discover from the Java VM whether particular characters are encodable are not.
The encodings that Saxon recognizes directly (including synonyms) are
ASCII, US-ASCII, iso-646, iso646, iso-8859-1, ISO8859_1, iso-8859-2, ISO8859_2,
iso-8859-5, ISO8859_5, iso-8859-7, ISO8859_7, iso-8859-8, ISO8859_8, iso-8859-9, ISO8859_9,
UTF-8, UTF8, UTF-16, UTF16,
KOI8-R, Big5, SJIS, Shift_JIS, EUC_CN, GB2312, EUC-JP, EUC-KR
cp1250, windows-1250, cp1251, windows-1251, cp1252, windows-1252, cp852, windows-852.