Package net.sf.saxon.str
Class StringTool
java.lang.Object
net.sf.saxon.str.StringTool
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic void
appendRepeated
(StringBuilder builder, char ch, int count) Insert repeated occurrences of a given character at the end of a StringBuilderstatic IntIterator
codePoints
(CharSequence value) Get an iterator over the codepoints in aCharSequence
- typically aString
static UnicodeString
compress
(char[] in, int offset, int len, boolean compressWS) Attempt to compress a UnicodeString consisting entirely of whitespace.static boolean
Ask whether a string contains astral characters (represented as surrogate pairs)static void
copy16to24
(char[] source, int sourcePos, byte[] dest, int destPos, int count) Copy from an array of 16-bit characters to an array holding 16-bit characters.static void
copy8to16
(byte[] source, int sourcePos, char[] dest, int destPos, int count) Copy from an array of 8-bit characters to an array holding 16-bit characters.static void
copy8to24
(byte[] source, int sourcePos, byte[] dest, int destPos, int count) Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.static String
Produce a diagnostic representation of the contents of the stringstatic int[]
Expand a string into an array of 32-bit charactersstatic UnicodeString
fromCharSequence
(CharSequence chars) static UnicodeString
fromCodePoints
(int[] codes, int used) Contract an array of integers containing Unicode codepoints into a stringstatic UnicodeString
fromLatin1
(String str) Construct aUnicodeString
from aString
that is known to consist entirely of 8-bit Latin-1 characters.static int
Get the length of a string, as defined in XPath.static int
Get the last codepoint in a UnicodeStringstatic long
lastIndexOf
(UnicodeString str, int codePoint) Get the position of the last occurrence of a given codepoint within a stringstatic void
prependRepeated
(StringBuilder builder, char ch, int count) Insert repeated occurrences of a given character at the start of a StringBuilderstatic void
prependWideChar
(StringBuilder builder, int ch) Insert a wide character (surrogate pair) at the start of a StringBuilder
-
Constructor Details
-
StringTool
public StringTool()
-
-
Method Details
-
getStringLength
Get the length of a string, as defined in XPath. This is not the same as the Java length, as a Unicode surrogate pair counts as a single character.- Parameters:
s
- The string whose length is required- Returns:
- the length of the string in Unicode code points
-
expand
Expand a string into an array of 32-bit characters- Parameters:
s
- the string to be expanded- Returns:
- an array of integers representing the Unicode code points
-
containsSurrogates
Ask whether a string contains astral characters (represented as surrogate pairs)- Parameters:
str
- the string to be tested- Returns:
- true if the string contains surrogate characters
-
fromCodePoints
Contract an array of integers containing Unicode codepoints into a string- Parameters:
codes
- an array of integers representing the Unicode code pointsused
- the number of items in the array that are actually used- Returns:
- the constructed string
-
fromCharSequence
- Parameters:
chars
- the suppliedString
orCharSequence
- Returns:
- the equivalent
UnicodeString
-
fromLatin1
Construct aUnicodeString
from aString
that is known to consist entirely of 8-bit Latin-1 characters.- Parameters:
str
- the suppliedString
: the caller warrants that this contains no characters with codepoint higher than 255.- Returns:
- the equivalent
UnicodeString
-
codePoints
Get an iterator over the codepoints in aCharSequence
- typically aString
- Parameters:
value
- the supplied string- Returns:
- an
IntIterator
allowing iteration over the codepoints. Note the protocol forIntIterator
requires exactly one call ofIntIterator.hasNext()
before every call ofIntIterator.next()
-
diagnosticDisplay
Produce a diagnostic representation of the contents of the string- Parameters:
s
- the string- Returns:
- a string in which non-Ascii-printable characters are replaced by \ uXXXX escapes
-
prependWideChar
Insert a wide character (surrogate pair) at the start of a StringBuilder- Parameters:
builder
- the string builderch
- the codepoint of the character to be inserted
-
prependRepeated
Insert repeated occurrences of a given character at the start of a StringBuilder- Parameters:
builder
- the string builderch
- the character to be insertedcount
- the number of repetitions
-
appendRepeated
Insert repeated occurrences of a given character at the end of a StringBuilder- Parameters:
builder
- the string builderch
- the character to be insertedcount
- the number of repetitions
-
lastCodePoint
Get the last codepoint in a UnicodeString- Parameters:
str
- the input string- Returns:
- the integer value of the last character in the string
- Throws:
IndexOutOfBoundsException
- if the string is empty
-
lastIndexOf
Get the position of the last occurrence of a given codepoint within a string- Parameters:
str
- the input stringcodePoint
- the sought codepoint- Returns:
- the zero-based position of the last occurrence of the codepoint within the input string, or -1 if the codepoint does not appear within the string
-
compress
Attempt to compress a UnicodeString consisting entirely of whitespace. This is the first thing we do to an incoming text node- Parameters:
in
- the Unicode string to be compressedoffset
- the start position of the substring we are interested inlen
- the length of the substring we are interested incompressWS
- set to true if whitespace compression is to be attempted- Returns:
- the compressed sequence if it can be compressed; or the uncompressed UnicodeString otherwise
-
copy8to16
public static void copy8to16(byte[] source, int sourcePos, char[] dest, int destPos, int count) Copy from an array of 8-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source
- the source arraysourcePos
- the position in the source array where copying is to startdest
- the destination arraydestPos
- the position in the destination array where copying is to startcount
- the number of characters (codepoints) to copy
-
copy8to24
public static void copy8to24(byte[] source, int sourcePos, byte[] dest, int destPos, int count) Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source
- the source arraysourcePos
- the position in the source array where copying is to startdest
- the destination array, using three bytes per codepointdestPos
- the codepoint position (not byte position) in the destination array where copying is to startcount
- the number of characters (codepoints) to copy
-
copy16to24
public static void copy16to24(char[] source, int sourcePos, byte[] dest, int destPos, int count) Copy from an array of 16-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source
- the source array. The caller is responsible for ensuring that this contains no surrogatessourcePos
- the position in the source array where copying is to startdest
- the destination arraydestPos
- the position in the destination array where copying is to startcount
- the number of characters (codepoints) to copy
-