Package net.sf.saxon.str
Class StringTool
java.lang.Object
net.sf.saxon.str.StringTool
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic voidappendRepeated(StringBuilder builder, char ch, int count) Insert repeated occurrences of a given character at the end of a StringBuilderstatic IntIteratorcodePoints(CharSequence value) Get an iterator over the codepoints in aCharSequence- typically aStringstatic UnicodeStringcompress(char[] in, int offset, int len, boolean compressWS) Attempt to compress a UnicodeString consisting entirely of whitespace.static booleanAsk whether a string contains astral characters (represented as surrogate pairs)static voidcopy16to24(char[] source, int sourcePos, byte[] dest, int destPos, int count) Copy from an array of 16-bit characters to an array holding 16-bit characters.static voidcopy8to16(byte[] source, int sourcePos, char[] dest, int destPos, int count) Copy from an array of 8-bit characters to an array holding 16-bit characters.static voidcopy8to24(byte[] source, int sourcePos, byte[] dest, int destPos, int count) Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.static StringProduce a diagnostic representation of the contents of the stringstatic int[]Expand a string into an array of 32-bit charactersstatic UnicodeStringfromCharSequence(CharSequence chars) static UnicodeStringfromCodePoints(int[] codes, int used) Contract an array of integers containing Unicode codepoints into a stringstatic UnicodeStringfromLatin1(String str) Construct aUnicodeStringfrom aStringthat is known to consist entirely of 8-bit Latin-1 characters.static intGet the length of a string, as defined in XPath.static intGet the last codepoint in a UnicodeStringstatic longlastIndexOf(UnicodeString str, int codePoint) Get the position of the last occurrence of a given codepoint within a stringstatic voidprependRepeated(StringBuilder builder, char ch, int count) Insert repeated occurrences of a given character at the start of a StringBuilderstatic voidprependWideChar(StringBuilder builder, int ch) Insert a wide character (surrogate pair) at the start of a StringBuilder
-
Constructor Details
-
StringTool
public StringTool()
-
-
Method Details
-
getStringLength
Get the length of a string, as defined in XPath. This is not the same as the Java length, as a Unicode surrogate pair counts as a single character.- Parameters:
s- The string whose length is required- Returns:
- the length of the string in Unicode code points
-
expand
Expand a string into an array of 32-bit characters- Parameters:
s- the string to be expanded- Returns:
- an array of integers representing the Unicode code points
-
containsSurrogates
Ask whether a string contains astral characters (represented as surrogate pairs)- Parameters:
str- the string to be tested- Returns:
- true if the string contains surrogate characters
-
fromCodePoints
Contract an array of integers containing Unicode codepoints into a string- Parameters:
codes- an array of integers representing the Unicode code pointsused- the number of items in the array that are actually used- Returns:
- the constructed string
-
fromCharSequence
- Parameters:
chars- the suppliedStringorCharSequence- Returns:
- the equivalent
UnicodeString
-
fromLatin1
Construct aUnicodeStringfrom aStringthat is known to consist entirely of 8-bit Latin-1 characters.- Parameters:
str- the suppliedString: the caller warrants that this contains no characters with codepoint higher than 255.- Returns:
- the equivalent
UnicodeString
-
codePoints
Get an iterator over the codepoints in aCharSequence- typically aString- Parameters:
value- the supplied string- Returns:
- an
IntIteratorallowing iteration over the codepoints. Note the protocol forIntIteratorrequires exactly one call ofIntIterator.hasNext()before every call ofIntIterator.next()
-
diagnosticDisplay
Produce a diagnostic representation of the contents of the string- Parameters:
s- the string- Returns:
- a string in which non-Ascii-printable characters are replaced by \ uXXXX escapes
-
prependWideChar
Insert a wide character (surrogate pair) at the start of a StringBuilder- Parameters:
builder- the string builderch- the codepoint of the character to be inserted
-
prependRepeated
Insert repeated occurrences of a given character at the start of a StringBuilder- Parameters:
builder- the string builderch- the character to be insertedcount- the number of repetitions
-
appendRepeated
Insert repeated occurrences of a given character at the end of a StringBuilder- Parameters:
builder- the string builderch- the character to be insertedcount- the number of repetitions
-
lastCodePoint
Get the last codepoint in a UnicodeString- Parameters:
str- the input string- Returns:
- the integer value of the last character in the string
- Throws:
IndexOutOfBoundsException- if the string is empty
-
lastIndexOf
Get the position of the last occurrence of a given codepoint within a string- Parameters:
str- the input stringcodePoint- the sought codepoint- Returns:
- the zero-based position of the last occurrence of the codepoint within the input string, or -1 if the codepoint does not appear within the string
-
compress
Attempt to compress a UnicodeString consisting entirely of whitespace. This is the first thing we do to an incoming text node- Parameters:
in- the Unicode string to be compressedoffset- the start position of the substring we are interested inlen- the length of the substring we are interested incompressWS- set to true if whitespace compression is to be attempted- Returns:
- the compressed sequence if it can be compressed; or the uncompressed UnicodeString otherwise
-
copy8to16
public static void copy8to16(byte[] source, int sourcePos, char[] dest, int destPos, int count) Copy from an array of 8-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source- the source arraysourcePos- the position in the source array where copying is to startdest- the destination arraydestPos- the position in the destination array where copying is to startcount- the number of characters (codepoints) to copy
-
copy8to24
public static void copy8to24(byte[] source, int sourcePos, byte[] dest, int destPos, int count) Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source- the source arraysourcePos- the position in the source array where copying is to startdest- the destination array, using three bytes per codepointdestPos- the codepoint position (not byte position) in the destination array where copying is to startcount- the number of characters (codepoints) to copy
-
copy16to24
public static void copy16to24(char[] source, int sourcePos, byte[] dest, int destPos, int count) Copy from an array of 16-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source- the source array. The caller is responsible for ensuring that this contains no surrogatessourcePos- the position in the source array where copying is to startdest- the destination arraydestPos- the position in the destination array where copying is to startcount- the number of characters (codepoints) to copy
-