Package net.sf.saxon.serialize.charcode
Class UTF16CharacterSet
java.lang.Object
net.sf.saxon.serialize.charcode.UTF16CharacterSet
- All Implemented Interfaces:
CharacterSet
A class to hold some static constants and methods associated with processing UTF16 and surrogate pairs
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final intstatic final intstatic final charstatic final charstatic final charstatic final char -
Method Summary
Modifier and TypeMethodDescriptionstatic intcombinePair(char high, char low) Return the non-BMP character corresponding to a given surrogate pair surrogates.static intfirstInvalidChar(IntIterator iter, IntPredicateProxy predicate) Test whether all the characters in a CharSequence are valid XML charactersGet the preferred Java name of the character set.static UTF16CharacterSetGet the singular instance of this classstatic charhighSurrogate(int ch) Return the high surrogate of a non-BMP characterbooleaninCharset(int c) Determine if a character is present in the character setstatic booleanisHighSurrogate(int ch) Test whether the given character is a high surrogatestatic booleanisLowSurrogate(int ch) Test whether the given character is a low surrogatestatic booleanisSurrogate(int c) Test whether a given character is a surrogate (high or low)static charlowSurrogate(int ch) Return the low surrogate of a non-BMP character
-
Field Details
-
NONBMP_MIN
public static final int NONBMP_MIN- See Also:
-
NONBMP_MAX
public static final int NONBMP_MAX- See Also:
-
SURROGATE1_MIN
public static final char SURROGATE1_MIN- See Also:
-
SURROGATE1_MAX
public static final char SURROGATE1_MAX- See Also:
-
SURROGATE2_MIN
public static final char SURROGATE2_MIN- See Also:
-
SURROGATE2_MAX
public static final char SURROGATE2_MAX- See Also:
-
-
Method Details
-
getInstance
Get the singular instance of this class- Returns:
- the singular instance of this class
-
inCharset
public boolean inCharset(int c) Description copied from interface:CharacterSetDetermine if a character is present in the character set- Specified by:
inCharsetin interfaceCharacterSet- Parameters:
c- the codepoint being tested- Returns:
- true if the codepoint is supported
-
getCanonicalName
Description copied from interface:CharacterSetGet the preferred Java name of the character set. Note that Java in many cases also supports a "historic name".- Specified by:
getCanonicalNamein interfaceCharacterSet- Returns:
- the preferred Java name
-
combinePair
public static int combinePair(char high, char low) Return the non-BMP character corresponding to a given surrogate pair surrogates.- Parameters:
high- The high surrogate.low- The low surrogate.- Returns:
- the Unicode codepoint represented by the surrogate pair
-
highSurrogate
public static char highSurrogate(int ch) Return the high surrogate of a non-BMP character- Parameters:
ch- The Unicode codepoint of the non-BMP character to be divided.- Returns:
- the first character in the surrogate pair
-
lowSurrogate
public static char lowSurrogate(int ch) Return the low surrogate of a non-BMP character- Parameters:
ch- The Unicode codepoint of the non-BMP character to be divided.- Returns:
- the second character in the surrogate pair
-
isSurrogate
public static boolean isSurrogate(int c) Test whether a given character is a surrogate (high or low)- Parameters:
c- the character to test- Returns:
- true if the character is the high or low half of a surrogate pair
-
isHighSurrogate
public static boolean isHighSurrogate(int ch) Test whether the given character is a high surrogate- Parameters:
ch- The character to test.- Returns:
- true if the character is the first character in a surrogate pair
-
isLowSurrogate
public static boolean isLowSurrogate(int ch) Test whether the given character is a low surrogate- Parameters:
ch- The character to test.- Returns:
- true if the character is the second character in a surrogate pair
-
firstInvalidChar
Test whether all the characters in a CharSequence are valid XML characters- Parameters:
iter- iterator over the character sequence to be testedpredicate- the predicate that all characters must satisfy- Returns:
- the codepoint of the first invalid character in the character sequence (according to the supplied predicate); or -1 if all characters in the character sequence are valid
-