Class StringTool


  • public class StringTool
    extends java.lang.Object
    • Constructor Summary

      Constructors 
      Constructor Description
      StringTool()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void appendRepeated​(java.lang.StringBuilder builder, char ch, int count)
      Insert repeated occurrences of a given character at the end of a StringBuilder
      static IntIterator codePoints​(java.lang.CharSequence value)
      Get an iterator over the codepoints in a CharSequence - typically a String
      static UnicodeString compress​(char[] in, int offset, int len, boolean compressWS)
      Attempt to compress a UnicodeString consisting entirely of whitespace.
      static boolean containsSurrogates​(java.lang.CharSequence str)
      Ask whether a string contains astral characters (represented as surrogate pairs)
      static void copy16to24​(char[] source, int sourcePos, byte[] dest, int destPos, int count)
      Copy from an array of 16-bit characters to an array holding 16-bit characters.
      static void copy8to16​(byte[] source, int sourcePos, char[] dest, int destPos, int count)
      Copy from an array of 8-bit characters to an array holding 16-bit characters.
      static void copy8to24​(byte[] source, int sourcePos, byte[] dest, int destPos, int count)
      Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
      static java.lang.String diagnosticDisplay​(java.lang.String s)
      Produce a diagnostic representation of the contents of the string
      static int[] expand​(UnicodeString s)
      Expand a string into an array of 32-bit characters
      static UnicodeString fromCharSequence​(java.lang.CharSequence chars)
      Construct a UnicodeString from a CharSequence - typically a String
      static UnicodeString fromCodePoints​(int[] codes, int used)
      Contract an array of integers containing Unicode codepoints into a string
      static UnicodeString fromLatin1​(java.lang.String str)
      Construct a UnicodeString from a String that is known to consist entirely of 8-bit Latin-1 characters.
      static int getStringLength​(java.lang.CharSequence s)
      Get the length of a string, as defined in XPath.
      static int lastCodePoint​(UnicodeString str)
      Get the last codepoint in a UnicodeString
      static long lastIndexOf​(UnicodeString str, int codePoint)
      Get the position of the last occurrence of a given codepoint within a string
      static void prependRepeated​(java.lang.StringBuilder builder, char ch, int count)
      Insert repeated occurrences of a given character at the start of a StringBuilder
      static void prependWideChar​(java.lang.StringBuilder builder, int ch)
      Insert a wide character (surrogate pair) at the start of a StringBuilder
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • StringTool

        public StringTool()
    • Method Detail

      • getStringLength

        public static int getStringLength​(java.lang.CharSequence s)
        Get the length of a string, as defined in XPath. This is not the same as the Java length, as a Unicode surrogate pair counts as a single character.
        Parameters:
        s - The string whose length is required
        Returns:
        the length of the string in Unicode code points
      • expand

        public static int[] expand​(UnicodeString s)
        Expand a string into an array of 32-bit characters
        Parameters:
        s - the string to be expanded
        Returns:
        an array of integers representing the Unicode code points
      • containsSurrogates

        public static boolean containsSurrogates​(java.lang.CharSequence str)
        Ask whether a string contains astral characters (represented as surrogate pairs)
        Parameters:
        str - the string to be tested
        Returns:
        true if the string contains surrogate characters
      • fromCodePoints

        public static UnicodeString fromCodePoints​(int[] codes,
                                                   int used)
        Contract an array of integers containing Unicode codepoints into a string
        Parameters:
        codes - an array of integers representing the Unicode code points
        used - the number of items in the array that are actually used
        Returns:
        the constructed string
      • fromCharSequence

        public static UnicodeString fromCharSequence​(java.lang.CharSequence chars)
        Construct a UnicodeString from a CharSequence - typically a String
        Parameters:
        chars - the supplied String or CharSequence
        Returns:
        the equivalent UnicodeString
      • fromLatin1

        public static UnicodeString fromLatin1​(java.lang.String str)
        Construct a UnicodeString from a String that is known to consist entirely of 8-bit Latin-1 characters.
        Parameters:
        str - the supplied String: the caller warrants that this contains no characters with codepoint higher than 255.
        Returns:
        the equivalent UnicodeString
      • codePoints

        public static IntIterator codePoints​(java.lang.CharSequence value)
        Get an iterator over the codepoints in a CharSequence - typically a String
        Parameters:
        value - the supplied string
        Returns:
        an IntIterator allowing iteration over the codepoints. Note the protocol for IntIterator requires exactly one call of IntIterator.hasNext() before every call of IntIterator.next()
      • diagnosticDisplay

        public static java.lang.String diagnosticDisplay​(java.lang.String s)
        Produce a diagnostic representation of the contents of the string
        Parameters:
        s - the string
        Returns:
        a string in which non-Ascii-printable characters are replaced by \ uXXXX escapes
      • prependWideChar

        public static void prependWideChar​(java.lang.StringBuilder builder,
                                           int ch)
        Insert a wide character (surrogate pair) at the start of a StringBuilder
        Parameters:
        builder - the string builder
        ch - the codepoint of the character to be inserted
      • prependRepeated

        public static void prependRepeated​(java.lang.StringBuilder builder,
                                           char ch,
                                           int count)
        Insert repeated occurrences of a given character at the start of a StringBuilder
        Parameters:
        builder - the string builder
        ch - the character to be inserted
        count - the number of repetitions
      • appendRepeated

        public static void appendRepeated​(java.lang.StringBuilder builder,
                                          char ch,
                                          int count)
        Insert repeated occurrences of a given character at the end of a StringBuilder
        Parameters:
        builder - the string builder
        ch - the character to be inserted
        count - the number of repetitions
      • lastCodePoint

        public static int lastCodePoint​(UnicodeString str)
        Get the last codepoint in a UnicodeString
        Parameters:
        str - the input string
        Returns:
        the integer value of the last character in the string
        Throws:
        java.lang.IndexOutOfBoundsException - if the string is empty
      • lastIndexOf

        public static long lastIndexOf​(UnicodeString str,
                                       int codePoint)
        Get the position of the last occurrence of a given codepoint within a string
        Parameters:
        str - the input string
        codePoint - the sought codepoint
        Returns:
        the zero-based position of the last occurrence of the codepoint within the input string, or -1 if the codepoint does not appear within the string
      • compress

        public static UnicodeString compress​(char[] in,
                                             int offset,
                                             int len,
                                             boolean compressWS)
        Attempt to compress a UnicodeString consisting entirely of whitespace. This is the first thing we do to an incoming text node
        Parameters:
        in - the Unicode string to be compressed
        offset - the start position of the substring we are interested in
        len - the length of the substring we are interested in
        compressWS - set to true if whitespace compression is to be attempted
        Returns:
        the compressed sequence if it can be compressed; or the uncompressed UnicodeString otherwise
      • copy8to16

        public static void copy8to16​(byte[] source,
                                     int sourcePos,
                                     char[] dest,
                                     int destPos,
                                     int count)
        Copy from an array of 8-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
        Parameters:
        source - the source array
        sourcePos - the position in the source array where copying is to start
        dest - the destination array
        destPos - the position in the destination array where copying is to start
        count - the number of characters (codepoints) to copy
      • copy8to24

        public static void copy8to24​(byte[] source,
                                     int sourcePos,
                                     byte[] dest,
                                     int destPos,
                                     int count)
        Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
        Parameters:
        source - the source array
        sourcePos - the position in the source array where copying is to start
        dest - the destination array, using three bytes per codepoint
        destPos - the codepoint position (not byte position) in the destination array where copying is to start
        count - the number of characters (codepoints) to copy
      • copy16to24

        public static void copy16to24​(char[] source,
                                      int sourcePos,
                                      byte[] dest,
                                      int destPos,
                                      int count)
        Copy from an array of 16-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
        Parameters:
        source - the source array. The caller is responsible for ensuring that this contains no surrogates
        sourcePos - the position in the source array where copying is to start
        dest - the destination array
        destPos - the position in the destination array where copying is to start
        count - the number of characters (codepoints) to copy