Class StringTool

java.lang.Object
net.sf.saxon.str.StringTool

public class StringTool extends Object
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static void
    appendRepeated(StringBuilder builder, char ch, int count)
    Insert repeated occurrences of a given character at the end of a StringBuilder
    Get an iterator over the codepoints in a CharSequence - typically a String
    compress(char[] in, int offset, int len, boolean compressWS)
    Attempt to compress a UnicodeString consisting entirely of whitespace.
    static boolean
    Ask whether a string contains astral characters (represented as surrogate pairs)
    static void
    copy16to24(char[] source, int sourcePos, byte[] dest, int destPos, int count)
    Copy from an array of 16-bit characters to an array holding 16-bit characters.
    static void
    copy8to16(byte[] source, int sourcePos, char[] dest, int destPos, int count)
    Copy from an array of 8-bit characters to an array holding 16-bit characters.
    static void
    copy8to24(byte[] source, int sourcePos, byte[] dest, int destPos, int count)
    Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
    static String
    Produce a diagnostic representation of the contents of the string
    static int[]
    Expand a string into an array of 32-bit characters
    Construct a UnicodeString from a CharSequence - typically a String
    fromCodePoints(int[] codes, int used)
    Contract an array of integers containing Unicode codepoints into a string
    Construct a UnicodeString from a String that is known to consist entirely of 8-bit Latin-1 characters.
    static int
    Get the length of a string, as defined in XPath.
    static int
    Get the last codepoint in a UnicodeString
    static long
    lastIndexOf(UnicodeString str, int codePoint)
    Get the position of the last occurrence of a given codepoint within a string
    static void
    prependRepeated(StringBuilder builder, char ch, int count)
    Insert repeated occurrences of a given character at the start of a StringBuilder
    static void
    prependWideChar(StringBuilder builder, int ch)
    Insert a wide character (surrogate pair) at the start of a StringBuilder

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • StringTool

      public StringTool()
  • Method Details

    • getStringLength

      public static int getStringLength(CharSequence s)
      Get the length of a string, as defined in XPath. This is not the same as the Java length, as a Unicode surrogate pair counts as a single character.
      Parameters:
      s - The string whose length is required
      Returns:
      the length of the string in Unicode code points
    • expand

      public static int[] expand(UnicodeString s)
      Expand a string into an array of 32-bit characters
      Parameters:
      s - the string to be expanded
      Returns:
      an array of integers representing the Unicode code points
    • containsSurrogates

      public static boolean containsSurrogates(CharSequence str)
      Ask whether a string contains astral characters (represented as surrogate pairs)
      Parameters:
      str - the string to be tested
      Returns:
      true if the string contains surrogate characters
    • fromCodePoints

      public static UnicodeString fromCodePoints(int[] codes, int used)
      Contract an array of integers containing Unicode codepoints into a string
      Parameters:
      codes - an array of integers representing the Unicode code points
      used - the number of items in the array that are actually used
      Returns:
      the constructed string
    • fromCharSequence

      public static UnicodeString fromCharSequence(CharSequence chars)
      Construct a UnicodeString from a CharSequence - typically a String
      Parameters:
      chars - the supplied String or CharSequence
      Returns:
      the equivalent UnicodeString
    • fromLatin1

      public static UnicodeString fromLatin1(String str)
      Construct a UnicodeString from a String that is known to consist entirely of 8-bit Latin-1 characters.
      Parameters:
      str - the supplied String: the caller warrants that this contains no characters with codepoint higher than 255.
      Returns:
      the equivalent UnicodeString
    • codePoints

      public static IntIterator codePoints(CharSequence value)
      Get an iterator over the codepoints in a CharSequence - typically a String
      Parameters:
      value - the supplied string
      Returns:
      an IntIterator allowing iteration over the codepoints. Note the protocol for IntIterator requires exactly one call of IntIterator.hasNext() before every call of IntIterator.next()
    • diagnosticDisplay

      public static String diagnosticDisplay(String s)
      Produce a diagnostic representation of the contents of the string
      Parameters:
      s - the string
      Returns:
      a string in which non-Ascii-printable characters are replaced by \ uXXXX escapes
    • prependWideChar

      public static void prependWideChar(StringBuilder builder, int ch)
      Insert a wide character (surrogate pair) at the start of a StringBuilder
      Parameters:
      builder - the string builder
      ch - the codepoint of the character to be inserted
    • prependRepeated

      public static void prependRepeated(StringBuilder builder, char ch, int count)
      Insert repeated occurrences of a given character at the start of a StringBuilder
      Parameters:
      builder - the string builder
      ch - the character to be inserted
      count - the number of repetitions
    • appendRepeated

      public static void appendRepeated(StringBuilder builder, char ch, int count)
      Insert repeated occurrences of a given character at the end of a StringBuilder
      Parameters:
      builder - the string builder
      ch - the character to be inserted
      count - the number of repetitions
    • lastCodePoint

      public static int lastCodePoint(UnicodeString str)
      Get the last codepoint in a UnicodeString
      Parameters:
      str - the input string
      Returns:
      the integer value of the last character in the string
      Throws:
      IndexOutOfBoundsException - if the string is empty
    • lastIndexOf

      public static long lastIndexOf(UnicodeString str, int codePoint)
      Get the position of the last occurrence of a given codepoint within a string
      Parameters:
      str - the input string
      codePoint - the sought codepoint
      Returns:
      the zero-based position of the last occurrence of the codepoint within the input string, or -1 if the codepoint does not appear within the string
    • compress

      public static UnicodeString compress(char[] in, int offset, int len, boolean compressWS)
      Attempt to compress a UnicodeString consisting entirely of whitespace. This is the first thing we do to an incoming text node
      Parameters:
      in - the Unicode string to be compressed
      offset - the start position of the substring we are interested in
      len - the length of the substring we are interested in
      compressWS - set to true if whitespace compression is to be attempted
      Returns:
      the compressed sequence if it can be compressed; or the uncompressed UnicodeString otherwise
    • copy8to16

      public static void copy8to16(byte[] source, int sourcePos, char[] dest, int destPos, int count)
      Copy from an array of 8-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
      Parameters:
      source - the source array
      sourcePos - the position in the source array where copying is to start
      dest - the destination array
      destPos - the position in the destination array where copying is to start
      count - the number of characters (codepoints) to copy
    • copy8to24

      public static void copy8to24(byte[] source, int sourcePos, byte[] dest, int destPos, int count)
      Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
      Parameters:
      source - the source array
      sourcePos - the position in the source array where copying is to start
      dest - the destination array, using three bytes per codepoint
      destPos - the codepoint position (not byte position) in the destination array where copying is to start
      count - the number of characters (codepoints) to copy
    • copy16to24

      public static void copy16to24(char[] source, int sourcePos, byte[] dest, int destPos, int count)
      Copy from an array of 16-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
      Parameters:
      source - the source array. The caller is responsible for ensuring that this contains no surrogates
      sourcePos - the position in the source array where copying is to start
      dest - the destination array
      destPos - the position in the destination array where copying is to start
      count - the number of characters (codepoints) to copy