Class StringTool


  • public class StringTool
    extends java.lang.Object
    • Constructor Summary

      Constructors 
      Constructor Description
      StringTool()  
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static void appendRepeated​(java.lang.StringBuilder builder, char ch, int count)
      Insert repeated occurrences of a given character at the end of a StringBuilder
      static IntIterator codePoints​(java.lang.CharSequence value)  
      static UnicodeString compress​(char[] in, int offset, int len, boolean compressWS)
      Attempt to compress a UnicodeString consisting entirely of whitespace.
      static boolean containsSurrogates​(java.lang.String str)
      Ask whether a string contains astral characters (represented as surrogate pairs)
      static void copy16to24​(char[] source, int sourcePos, byte[] dest, int destPos, int count)
      Copy from an array of 16-bit characters to an array holding 16-bit characters.
      static void copy8to16​(byte[] source, int sourcePos, char[] dest, int destPos, int count)
      Copy from an array of 8-bit characters to an array holding 16-bit characters.
      static void copy8to24​(byte[] source, int sourcePos, byte[] dest, int destPos, int count)
      Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
      static java.lang.String diagnosticDisplay​(java.lang.String s)
      Produce a diagnostic representation of the contents of the string
      static int[] expand​(UnicodeString s)
      Expand a string into an array of 32-bit characters
      static UnicodeString fromCharSequence​(java.lang.CharSequence chars)  
      static UnicodeString fromCodePoints​(int[] codes, int used)
      Contract an array of integers containing Unicode codepoints into a string
      static UnicodeString fromLatin1​(java.lang.String str)  
      static int getStringLength​(java.lang.CharSequence s)
      Get the length of a string, as defined in XPath.
      static int lastCodePoint​(UnicodeString str)
      Get the last codepoint in a UnicodeString
      static long lastIndexOf​(UnicodeString str, int codePoint)
      Get the position of the last occurrence of a given codepoint within a string
      static void prependRepeated​(java.lang.StringBuilder builder, char ch, int count)
      Insert repeated occurrences of a given character at the start of a StringBuilder
      static void prependWideChar​(java.lang.StringBuilder builder, int ch)
      Insert a wide character (surrogate pair) at the start of a StringBuilder
      static int requireInt​(long value)
      Utility method for use where strings longer than 2^31 characters cannot yet be handled.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • StringTool

        public StringTool()
    • Method Detail

      • getStringLength

        public static int getStringLength​(java.lang.CharSequence s)
        Get the length of a string, as defined in XPath. This is not the same as the Java length, as a Unicode surrogate pair counts as a single character.
        Parameters:
        s - The string whose length is required
        Returns:
        the length of the string in Unicode code points
      • expand

        public static int[] expand​(UnicodeString s)
        Expand a string into an array of 32-bit characters
        Parameters:
        s - the string to be expanded
        Returns:
        an array of integers representing the Unicode code points
      • containsSurrogates

        public static boolean containsSurrogates​(java.lang.String str)
        Ask whether a string contains astral characters (represented as surrogate pairs)
        Parameters:
        str - the string to be tested
        Returns:
        true if the string contains surrogate characters
      • fromCodePoints

        public static UnicodeString fromCodePoints​(int[] codes,
                                                   int used)
        Contract an array of integers containing Unicode codepoints into a string
        Parameters:
        codes - an array of integers representing the Unicode code points
        used - the number of items in the array that are actually used
        Returns:
        the constructed string
      • fromCharSequence

        public static UnicodeString fromCharSequence​(java.lang.CharSequence chars)
      • fromLatin1

        public static UnicodeString fromLatin1​(java.lang.String str)
      • codePoints

        public static IntIterator codePoints​(java.lang.CharSequence value)
      • diagnosticDisplay

        public static java.lang.String diagnosticDisplay​(java.lang.String s)
        Produce a diagnostic representation of the contents of the string
        Parameters:
        s - the string
        Returns:
        a string in which non-Ascii-printable characters are replaced by \ uXXXX escapes
      • prependWideChar

        public static void prependWideChar​(java.lang.StringBuilder builder,
                                           int ch)
        Insert a wide character (surrogate pair) at the start of a StringBuilder
        Parameters:
        builder - the string builder
        ch - the codepoint of the character to be inserted
      • prependRepeated

        public static void prependRepeated​(java.lang.StringBuilder builder,
                                           char ch,
                                           int count)
        Insert repeated occurrences of a given character at the start of a StringBuilder
        Parameters:
        builder - the string builder
        ch - the character to be inserted
        count - the number of repetitions
      • appendRepeated

        public static void appendRepeated​(java.lang.StringBuilder builder,
                                          char ch,
                                          int count)
        Insert repeated occurrences of a given character at the end of a StringBuilder
        Parameters:
        builder - the string builder
        ch - the character to be inserted
        count - the number of repetitions
      • lastCodePoint

        public static int lastCodePoint​(UnicodeString str)
        Get the last codepoint in a UnicodeString
        Parameters:
        str - the input string
        Returns:
        the integer value of the last character in the string
        Throws:
        java.lang.IndexOutOfBoundsException - if the string is empty
      • lastIndexOf

        public static long lastIndexOf​(UnicodeString str,
                                       int codePoint)
        Get the position of the last occurrence of a given codepoint within a string
        Parameters:
        str - the input string
        codePoint - the sought codepoint
        Returns:
        the zero-based position of the last occurrence of the codepoint within the input string, or -1 if the codepoint does not appear within the string
      • requireInt

        public static int requireInt​(long value)
        Utility method for use where strings longer than 2^31 characters cannot yet be handled.
        Parameters:
        value - the actual value of a character position within a string, or the length of a string
        Returns:
        the value as an integer if it is within range
        Throws:
        java.lang.UnsupportedOperationException - if the supplied value exceeds Integer.MAX_VALUE
      • compress

        public static UnicodeString compress​(char[] in,
                                             int offset,
                                             int len,
                                             boolean compressWS)
        Attempt to compress a UnicodeString consisting entirely of whitespace. This is the first thing we do to an incoming text node
        Parameters:
        in - the Unicode string to be compressed
        offset - the start position of the substring we are interested in
        len - the length of the substring we are interested in
        compressWS - set to true if whitespace compression is to be attempted
        Returns:
        the compressed sequence if it can be compressed; or the uncompressed UnicodeString otherwise
      • copy8to16

        public static void copy8to16​(byte[] source,
                                     int sourcePos,
                                     char[] dest,
                                     int destPos,
                                     int count)
        Copy from an array of 8-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
        Parameters:
        source - the source array
        sourcePos - the position in the source array where copying is to start
        dest - the destination array
        destPos - the position in the destination array where copying is to start
        count - the number of characters (codepoints) to copy
      • copy8to24

        public static void copy8to24​(byte[] source,
                                     int sourcePos,
                                     byte[] dest,
                                     int destPos,
                                     int count)
        Copy from an array of 8-bit characters to an array holding 24-bit characters, organised as three bytes per character The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
        Parameters:
        source - the source array
        sourcePos - the position in the source array where copying is to start
        dest - the destination array, using three bytes per codepoint
        destPos - the codepoint position (not byte position) in the destination array where copying is to start
        count - the number of characters (codepoints) to copy
      • copy16to24

        public static void copy16to24​(char[] source,
                                      int sourcePos,
                                      byte[] dest,
                                      int destPos,
                                      int count)
        Copy from an array of 16-bit characters to an array holding 16-bit characters. The caller is responsible for ensuring that the offsets are in range and that the destination array is large enough.
        Parameters:
        source - the source array. The caller is responsible for ensuring that this contains no surrogates
        sourcePos - the position in the source array where copying is to start
        dest - the destination array
        destPos - the position in the destination array where copying is to start
        count - the number of characters (codepoints) to copy