Package net.sf.saxon.serialize.charcode
Class UTF16CharacterSet
java.lang.Object
net.sf.saxon.serialize.charcode.UTF16CharacterSet
- All Implemented Interfaces:
CharacterSet
A class to hold some static constants and methods associated with processing UTF16 and surrogate pairs
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
static final int
static final char
static final char
static final char
static final char
-
Method Summary
Modifier and TypeMethodDescriptionstatic int
combinePair
(char high, char low) Return the non-BMP character corresponding to a given surrogate pair surrogates.static int
firstInvalidChar
(IntIterator iter, IntPredicateProxy predicate) Test whether all the characters in a CharSequence are valid XML charactersGet the preferred Java name of the character set.static UTF16CharacterSet
Get the singular instance of this classstatic char
highSurrogate
(int ch) Return the high surrogate of a non-BMP characterboolean
inCharset
(int c) Determine if a character is present in the character setstatic boolean
isHighSurrogate
(int ch) Test whether the given character is a high surrogatestatic boolean
isLowSurrogate
(int ch) Test whether the given character is a low surrogatestatic boolean
isSurrogate
(int c) Test whether a given character is a surrogate (high or low)static char
lowSurrogate
(int ch) Return the low surrogate of a non-BMP character
-
Field Details
-
NONBMP_MIN
public static final int NONBMP_MIN- See Also:
-
NONBMP_MAX
public static final int NONBMP_MAX- See Also:
-
SURROGATE1_MIN
public static final char SURROGATE1_MIN- See Also:
-
SURROGATE1_MAX
public static final char SURROGATE1_MAX- See Also:
-
SURROGATE2_MIN
public static final char SURROGATE2_MIN- See Also:
-
SURROGATE2_MAX
public static final char SURROGATE2_MAX- See Also:
-
-
Method Details
-
getInstance
Get the singular instance of this class- Returns:
- the singular instance of this class
-
inCharset
public boolean inCharset(int c) Description copied from interface:CharacterSet
Determine if a character is present in the character set- Specified by:
inCharset
in interfaceCharacterSet
- Parameters:
c
- the codepoint being tested- Returns:
- true if the codepoint is supported
-
getCanonicalName
Description copied from interface:CharacterSet
Get the preferred Java name of the character set. Note that Java in many cases also supports a "historic name".- Specified by:
getCanonicalName
in interfaceCharacterSet
- Returns:
- the preferred Java name
-
combinePair
public static int combinePair(char high, char low) Return the non-BMP character corresponding to a given surrogate pair surrogates.- Parameters:
high
- The high surrogate.low
- The low surrogate.- Returns:
- the Unicode codepoint represented by the surrogate pair
-
highSurrogate
public static char highSurrogate(int ch) Return the high surrogate of a non-BMP character- Parameters:
ch
- The Unicode codepoint of the non-BMP character to be divided.- Returns:
- the first character in the surrogate pair
-
lowSurrogate
public static char lowSurrogate(int ch) Return the low surrogate of a non-BMP character- Parameters:
ch
- The Unicode codepoint of the non-BMP character to be divided.- Returns:
- the second character in the surrogate pair
-
isSurrogate
public static boolean isSurrogate(int c) Test whether a given character is a surrogate (high or low)- Parameters:
c
- the character to test- Returns:
- true if the character is the high or low half of a surrogate pair
-
isHighSurrogate
public static boolean isHighSurrogate(int ch) Test whether the given character is a high surrogate- Parameters:
ch
- The character to test.- Returns:
- true if the character is the first character in a surrogate pair
-
isLowSurrogate
public static boolean isLowSurrogate(int ch) Test whether the given character is a low surrogate- Parameters:
ch
- The character to test.- Returns:
- true if the character is the second character in a surrogate pair
-
firstInvalidChar
Test whether all the characters in a CharSequence are valid XML characters- Parameters:
iter
- iterator over the character sequence to be testedpredicate
- the predicate that all characters must satisfy- Returns:
- the codepoint of the first invalid character in the character sequence (according to the supplied predicate); or -1 if all characters in the character sequence are valid
-