Package net.sf.saxon.str
Class StringTool
- java.lang.Object
-
- net.sf.saxon.str.StringTool
-
public class StringTool extends java.lang.Object
-
-
Constructor Summary
Constructors Constructor Description StringTool()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static voidappendRepeated(java.lang.StringBuilder builder, char ch, int count)Insert repeated occurrences of a given character at the end of a StringBuilderstatic IntIteratorcodePoints(java.lang.CharSequence value)static UnicodeStringcompress(char[] in, int offset, int len, boolean compressWS)Attempt to compress a UnicodeString consisting entirely of whitespace.static booleancontainsSurrogates(java.lang.String str)Ask whether a string contains astral characters (represented as surrogate pairs)static voidcopy16to24(char[] source, int sourcePos, byte[] dest, int destPos, int count)Copy from an array of 16-bit characters to an array holding 16-bit characters.static voidcopy8to16(byte[] source, int sourcePos, char[] dest, int destPos, int count)Copy from an array of 8-bit characters to an array holding 16-bit characters.static voidcopy8to24(byte[] source, int sourcePos, byte[] dest, int destPos, int count)Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.static java.lang.StringdiagnosticDisplay(java.lang.String s)Produce a diagnostic representation of the contents of the stringstatic int[]expand(UnicodeString s)Expand a string into an array of 32-bit charactersstatic UnicodeStringfromCharSequence(java.lang.CharSequence chars)static UnicodeStringfromCodePoints(int[] codes, int used)Contract an array of integers containing Unicode codepoints into a stringstatic UnicodeStringfromLatin1(java.lang.String str)static intgetStringLength(java.lang.CharSequence s)Get the length of a string, as defined in XPath.static intlastCodePoint(UnicodeString str)Get the last codepoint in a UnicodeStringstatic longlastIndexOf(UnicodeString str, int codePoint)Get the position of the last occurrence of a given codepoint within a stringstatic voidprependRepeated(java.lang.StringBuilder builder, char ch, int count)Insert repeated occurrences of a given character at the start of a StringBuilderstatic voidprependWideChar(java.lang.StringBuilder builder, int ch)Insert a wide character (surrogate pair) at the start of a StringBuilderstatic intrequireInt(long value)Utility method for use where strings longer than 2^31 characters cannot yet be handled.
-
-
-
Method Detail
-
getStringLength
public static int getStringLength(java.lang.CharSequence s)
Get the length of a string, as defined in XPath. This is not the same as the Java length, as a Unicode surrogate pair counts as a single character.- Parameters:
s- The string whose length is required- Returns:
- the length of the string in Unicode code points
-
expand
public static int[] expand(UnicodeString s)
Expand a string into an array of 32-bit characters- Parameters:
s- the string to be expanded- Returns:
- an array of integers representing the Unicode code points
-
containsSurrogates
public static boolean containsSurrogates(java.lang.String str)
Ask whether a string contains astral characters (represented as surrogate pairs)- Parameters:
str- the string to be tested- Returns:
- true if the string contains surrogate characters
-
fromCodePoints
public static UnicodeString fromCodePoints(int[] codes, int used)
Contract an array of integers containing Unicode codepoints into a string- Parameters:
codes- an array of integers representing the Unicode code pointsused- the number of items in the array that are actually used- Returns:
- the constructed string
-
fromCharSequence
public static UnicodeString fromCharSequence(java.lang.CharSequence chars)
-
fromLatin1
public static UnicodeString fromLatin1(java.lang.String str)
-
codePoints
public static IntIterator codePoints(java.lang.CharSequence value)
-
diagnosticDisplay
public static java.lang.String diagnosticDisplay(java.lang.String s)
Produce a diagnostic representation of the contents of the string- Parameters:
s- the string- Returns:
- a string in which non-Ascii-printable characters are replaced by \ uXXXX escapes
-
prependWideChar
public static void prependWideChar(java.lang.StringBuilder builder, int ch)Insert a wide character (surrogate pair) at the start of a StringBuilder- Parameters:
builder- the string builderch- the codepoint of the character to be inserted
-
prependRepeated
public static void prependRepeated(java.lang.StringBuilder builder, char ch, int count)Insert repeated occurrences of a given character at the start of a StringBuilder- Parameters:
builder- the string builderch- the character to be insertedcount- the number of repetitions
-
appendRepeated
public static void appendRepeated(java.lang.StringBuilder builder, char ch, int count)Insert repeated occurrences of a given character at the end of a StringBuilder- Parameters:
builder- the string builderch- the character to be insertedcount- the number of repetitions
-
lastCodePoint
public static int lastCodePoint(UnicodeString str)
Get the last codepoint in a UnicodeString- Parameters:
str- the input string- Returns:
- the integer value of the last character in the string
- Throws:
java.lang.IndexOutOfBoundsException- if the string is empty
-
lastIndexOf
public static long lastIndexOf(UnicodeString str, int codePoint)
Get the position of the last occurrence of a given codepoint within a string- Parameters:
str- the input stringcodePoint- the sought codepoint- Returns:
- the zero-based position of the last occurrence of the codepoint within the input string, or -1 if the codepoint does not appear within the string
-
requireInt
public static int requireInt(long value)
Utility method for use where strings longer than 2^31 characters cannot yet be handled.- Parameters:
value- the actual value of a character position within a string, or the length of a string- Returns:
- the value as an integer if it is within range
- Throws:
java.lang.UnsupportedOperationException- if the supplied value exceedsInteger.MAX_VALUE
-
compress
public static UnicodeString compress(char[] in, int offset, int len, boolean compressWS)
Attempt to compress a UnicodeString consisting entirely of whitespace. This is the first thing we do to an incoming text node- Parameters:
in- the Unicode string to be compressedoffset- the start position of the substring we are interested inlen- the length of the substring we are interested incompressWS- set to true if whitespace compression is to be attempted- Returns:
- the compressed sequence if it can be compressed; or the uncompressed UnicodeString otherwise
-
copy8to16
public static void copy8to16(byte[] source, int sourcePos, char[] dest, int destPos, int count)Copy from an array of 8-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source- the source arraysourcePos- the position in the source array where copying is to startdest- the destination arraydestPos- the position in the destination array where copying is to startcount- the number of characters (codepoints) to copy
-
copy8to24
public static void copy8to24(byte[] source, int sourcePos, byte[] dest, int destPos, int count)Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source- the source arraysourcePos- the position in the source array where copying is to startdest- the destination array, using three bytes per codepointdestPos- the codepoint position (not byte position) in the destination array where copying is to startcount- the number of characters (codepoints) to copy
-
copy16to24
public static void copy16to24(char[] source, int sourcePos, byte[] dest, int destPos, int count)Copy from an array of 16-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.- Parameters:
source- the source array. The caller is responsible for ensuring that this contains no surrogatessourcePos- the position in the source array where copying is to startdest- the destination arraydestPos- the position in the destination array where copying is to startcount- the number of characters (codepoints) to copy
-
-