Package net.sf.saxon.regex
Class UnicodeString
- java.lang.Object
-
- net.sf.saxon.regex.UnicodeString
-
- All Implemented Interfaces:
java.lang.CharSequence
,java.lang.Comparable<UnicodeString>
,AtomicMatchKey
- Direct Known Subclasses:
BMPString
,EmptyString
,GeneralUnicodeString
,LatinString
public abstract class UnicodeString extends java.lang.Object implements java.lang.CharSequence, java.lang.Comparable<UnicodeString>, AtomicMatchKey
An abstract class that efficiently handles Unicode strings including non-BMP characters; it has three subclasses, respectively handling strings whose maximum character code is 255, 65535, or 1114111.
-
-
Field Summary
-
Fields inherited from interface net.sf.saxon.expr.sort.AtomicMatchKey
NaN_MATCH_KEY
-
-
Constructor Summary
Constructors Constructor Description UnicodeString()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description AtomicValue
asAtomic()
Get an atomic value that encapsulates this match key.int
compareTo(UnicodeString other)
Compare two unicode strings in codepoint collating sequencestatic boolean
containsSurrogatePairs(java.lang.CharSequence value)
Test whether a CharSequence contains Unicode codepoints outside the BMP rangeboolean
equals(java.lang.Object obj)
Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequenceint
hashCode()
Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequenceabstract boolean
isEnd(int pos)
Ask whether a given position is at (or beyond) the end of the stringstatic UnicodeString
makeUnicodeString(int[] in)
Make a UnicodeString for a given array of codepointsstatic UnicodeString
makeUnicodeString(java.lang.CharSequence in)
Make a UnicodeString for a given CharSequenceabstract int
uCharAt(int pos)
Get the character at a specified positionabstract int
uIndexOf(int search, int start)
Get the first match for a given characterabstract int
uLength()
Get the length of the string, in Unicode codepointsabstract UnicodeString
uSubstring(int beginIndex, int endIndex)
Get a substring of this string
-
-
-
Method Detail
-
makeUnicodeString
public static UnicodeString makeUnicodeString(java.lang.CharSequence in)
Make a UnicodeString for a given CharSequence- Parameters:
in
- the input CharSequence- Returns:
- a UnicodeString using an appropriate implementation class
-
makeUnicodeString
public static UnicodeString makeUnicodeString(int[] in)
Make a UnicodeString for a given array of codepoints- Parameters:
in
- the input CharSequence- Returns:
- a UnicodeString using an appropriate implementation class
-
containsSurrogatePairs
public static boolean containsSurrogatePairs(java.lang.CharSequence value)
Test whether a CharSequence contains Unicode codepoints outside the BMP range- Parameters:
value
- the string to be tested- Returns:
- true if the string contains non-BMP codepoints
-
uSubstring
public abstract UnicodeString uSubstring(int beginIndex, int endIndex)
Get a substring of this string- Parameters:
beginIndex
- the index of the first character to be included (counting codepoints, not 16-bit characters)endIndex
- the index of the first character to be NOT included (counting codepoints, not 16-bit characters)- Returns:
- a substring
- Throws:
java.lang.IndexOutOfBoundsException
- if the selection goes off the start or end of the string (this function follows the semantics of String.substring(), not the XPath semantics)
-
uIndexOf
public abstract int uIndexOf(int search, int start)
Get the first match for a given character- Parameters:
search
- the character to look forstart
- the first position to look- Returns:
- the position of the first occurrence of the sought character, or -1 if not found
-
uCharAt
public abstract int uCharAt(int pos)
Get the character at a specified position- Parameters:
pos
- the index of the required character (counting codepoints, not 16-bit characters)- Returns:
- a character (Unicode codepoint) at the specified position.
-
uLength
public abstract int uLength()
Get the length of the string, in Unicode codepoints- Returns:
- the number of codepoints in the string
-
isEnd
public abstract boolean isEnd(int pos)
Ask whether a given position is at (or beyond) the end of the string- Parameters:
pos
- the index of the required character (counting codepoints, not 16-bit characters)- Returns:
- true iff if the specified index is after the end of the character stream
-
hashCode
public int hashCode()
Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence- Overrides:
hashCode
in classjava.lang.Object
- Returns:
- a hashCode that distinguishes this UnicodeString from others
-
equals
public boolean equals(java.lang.Object obj)
Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence- Overrides:
equals
in classjava.lang.Object
- Parameters:
obj
- the object to be compared- Returns:
- true if obj is a UnicodeString containing the same codepoints
-
compareTo
public int compareTo(UnicodeString other)
Compare two unicode strings in codepoint collating sequence- Specified by:
compareTo
in interfacejava.lang.Comparable<UnicodeString>
- Parameters:
other
- the object to be compared- Returns:
- less than 0, 0, or greater than 0 depending on the ordering of the two strings
-
asAtomic
public AtomicValue asAtomic()
Get an atomic value that encapsulates this match key. Needed to support the collation-key() function.- Specified by:
asAtomic
in interfaceAtomicMatchKey
- Returns:
- an atomic value that encapsulates this match key
-
-