Class UnicodeString

    • Constructor Summary

      Constructors 
      Constructor Description
      UnicodeString()  
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      AtomicValue asAtomic()
      Get an atomic value that encapsulates this match key.
      protected void checkSubstringBounds​(long start, long end)  
      abstract int codePointAt​(long index)
      Get the code point at a given position in the string
      abstract IntIterator codePoints()
      Get an iterator over the code points present in the string.
      int compareTo​(UnicodeString other)
      Compare this string to another using codepoint comparison
      UnicodeString concat​(UnicodeString other)
      Concatenate with another string, returning a new string
      UnicodeString economize()  
      boolean equals​(java.lang.Object obj)  
      long estimatedLength()
      Get the estimated length of the string, suitable for space allocation.
      abstract int getWidth()
      Get the number of bits needed to hold all the characters in this string
      int hashCode()
      Compute a hashCode.
      boolean hasSubstring​(UnicodeString other, long offset)
      Ask whether this string has another string as its content starting at a given offset
      long indexOf​(int codePoint)
      Get the position of the first occurrence of the specified codepoint, starting the search at the beginning
      abstract long indexOf​(int codePoint, long from)
      Get the position of the first occurrence of the specified codepoint, starting the search at a given position in the string
      long indexOf​(UnicodeString other, long from)
      Get the first position, at or beyond from, where another string appears as a substring of this string, comparing codepoints.
      abstract long indexWhere​(java.util.function.IntPredicate predicate, long from)
      Get the position of the first occurrence of a codepoint that matches a supplied predicate, starting the search at a given position in the string
      boolean isEmpty()
      Ask whether the string is empty
      abstract long length()
      Get the length of the string
      int length32()
      Get the length of the string, provided it is less than 2^31 characters
      UnicodeString prefix​(long end)
      Get a substring of this string, starting at position 0, with a given end position
      static int requireInt​(long value)
      Utility method for use where strings longer than 2^31 characters cannot yet be handled.
      static int requireNonNegativeInt​(long value)
      Utility method for use where strings longer than 2^31 characters cannot yet be handled; and where negative offsets are to be treated as zero
      UnicodeString substring​(long start)
      Get a substring of this codepoint sequence, with a given start position, finishing at the end of the string
      abstract UnicodeString substring​(long start, long end)
      Get a substring of this string, with a given start and end position
      UnicodeString tidy()
      Ensure that the implementation is capable of counting codepoints in the string.
      void verifyCharacters()
      Diagnostic method: verify that all the characters in the string are valid XML codepoints
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • UnicodeString

        public UnicodeString()
    • Method Detail

      • tidy

        public UnicodeString tidy()
        Ensure that the implementation is capable of counting codepoints in the string. This is normally a null operation, but it may cause internal reorganisation.
        Returns:
        this UnicodeString, or another that represents the same sequence of characters.
      • length

        public abstract long length()
        Get the length of the string
        Returns:
        the number of code points in the string
      • length32

        public int length32()
        Get the length of the string, provided it is less than 2^31 characters
        Returns:
        the length of the string if it fits within a Java int
        Throws:
        java.lang.UnsupportedOperationException - if the string is longer than 2^31 characters
      • estimatedLength

        public long estimatedLength()
        Get the estimated length of the string, suitable for space allocation.
        Returns:
        for a UnicodeString, the actual length of the string in codepoints
      • isEmpty

        public boolean isEmpty()
        Ask whether the string is empty
        Returns:
        true if the length of the string is zero
      • getWidth

        public abstract int getWidth()
        Get the number of bits needed to hold all the characters in this string
        Returns:
        7 for ascii characters (not used??), 8 for latin-1, 16 for BMP, 24 for general Unicode.
      • indexOf

        public long indexOf​(int codePoint)
        Get the position of the first occurrence of the specified codepoint, starting the search at the beginning
        Parameters:
        codePoint - the sought codePoint
        Returns:
        the position (0-based) of the first occurrence found, or -1 if not found, counting codePoints rather than UTF16 chars.
        Throws:
        java.lang.UnsupportedOperationException - if the UnicodeString has not been prepared for codePoint access
      • indexOf

        public abstract long indexOf​(int codePoint,
                                     long from)
        Get the position of the first occurrence of the specified codepoint, starting the search at a given position in the string
        Parameters:
        codePoint - the sought codePoint
        from - the position from which the search should start (0-based). A negative value is treated as zero. A position beyond the end of the string results in a return value of -1 (meaning not found).
        Returns:
        the position (0-based) of the first occurrence found, or -1 if not found
        Throws:
        java.lang.UnsupportedOperationException - if the UnicodeString has not been prepared for codePoint access
      • indexWhere

        public abstract long indexWhere​(java.util.function.IntPredicate predicate,
                                        long from)
        Get the position of the first occurrence of a codepoint that matches a supplied predicate, starting the search at a given position in the string
        Parameters:
        predicate - condition that the codepoint must satisfy
        from - the position from which the search should start (0-based). A negative value is treated as zero. A position beyond the end of the string results in a return value of -1 (meaning not found).
        Returns:
        the position (0-based) of the first codepoint to match the predicate, or -1 if not found
      • indexOf

        public long indexOf​(UnicodeString other,
                            long from)
        Get the first position, at or beyond from, where another string appears as a substring of this string, comparing codepoints.
        Parameters:
        other - the other (sought) string
        from - the position from which the search should start (0-based). A negative value is treated as zero. A position beyond the end of the string results in a return value of -1 (meaning not found).
        Returns:
        the first position where the substring is found, or -1 if it is not found. Also returns -1 if from is negative, or beyond the length of the string.
      • hasSubstring

        public boolean hasSubstring​(UnicodeString other,
                                    long offset)
        Ask whether this string has another string as its content starting at a given offset
        Parameters:
        other - the other string
        offset - the starting position in this string (counting in codepoints)
        Returns:
        true if the other string appears as a substring of this string starting at the given position.
        Throws:
        java.lang.IndexOutOfBoundsException - if offset is less than zero or greater than the length of this string. Note that there is no exception if offset + other.length() exceeds this.length() - instead this results in a return value of false.
      • codePoints

        public abstract IntIterator codePoints()
        Get an iterator over the code points present in the string.
        Returns:
        an iterator that delivers the individual code points
      • codePointAt

        public abstract int codePointAt​(long index)
        Get the code point at a given position in the string
        Parameters:
        index - the given position (0-based)
        Returns:
        the code point at the given position
        Throws:
        java.lang.IndexOutOfBoundsException - if the index is out of range
      • substring

        public UnicodeString substring​(long start)
        Get a substring of this codepoint sequence, with a given start position, finishing at the end of the string
        Parameters:
        start - the start position (0-based): that is, the position of the first code point to be included
        Returns:
        the requested substring
        Throws:
        java.lang.IndexOutOfBoundsException - if the start position is out of range
      • substring

        public abstract UnicodeString substring​(long start,
                                                long end)
        Get a substring of this string, with a given start and end position
        Parameters:
        start - the start position (0-based): that is, the position of the first code point to be included
        end - the end position (0-based): specifically, the position of the first code point not to be included
        Returns:
        the requested substring
        Throws:
        java.lang.IndexOutOfBoundsException - if the start/end positions are out of range (the conditions are the same as for String.substring())
      • prefix

        public UnicodeString prefix​(long end)
        Get a substring of this string, starting at position 0, with a given end position
        Parameters:
        end - the end position (0-based): specifically, the position of the first code point not to be included
        Returns:
        the requested substring
        Throws:
        java.lang.IndexOutOfBoundsException - if the end position is out of range
      • concat

        public UnicodeString concat​(UnicodeString other)
        Concatenate with another string, returning a new string
        Parameters:
        other - the string to be appended
        Returns:
        the result of concatenating this string followed by the other
      • checkSubstringBounds

        protected void checkSubstringBounds​(long start,
                                            long end)
      • verifyCharacters

        public void verifyCharacters()
        Diagnostic method: verify that all the characters in the string are valid XML codepoints
        Throws:
        java.lang.IllegalStateException - if the contents are invalid
      • equals

        public boolean equals​(java.lang.Object obj)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Compute a hashCode. All implementations of UnicodeString use compatible hash codes and the hashing algorithm is therefore identical to that for java.lang.String. This means that for strings containing Astral characters, the hash code needs to be computed by decomposing an Astral character into a surrogate pair.
        Overrides:
        hashCode in class java.lang.Object
        Returns:
        the hash code
      • compareTo

        public int compareTo​(UnicodeString other)
        Compare this string to another using codepoint comparison
        Specified by:
        compareTo in interface java.lang.Comparable<UnicodeString>
        Parameters:
        other - the other string
        Returns:
        -1 if this string comes first, 0 if they are equal, +1 if the other string comes first
      • asAtomic

        public AtomicValue asAtomic()
        Get an atomic value that encapsulates this match key. Needed to support the collation-key() function.
        Specified by:
        asAtomic in interface AtomicMatchKey
        Returns:
        an atomic value that encapsulates this match key
      • requireInt

        public static int requireInt​(long value)
        Utility method for use where strings longer than 2^31 characters cannot yet be handled.
        Parameters:
        value - the actual value of a character position within a string, or the length of a string
        Returns:
        the value as an integer if it is within range
        Throws:
        java.lang.UnsupportedOperationException - if the supplied value exceeds Integer.MAX_VALUE
      • requireNonNegativeInt

        public static int requireNonNegativeInt​(long value)
        Utility method for use where strings longer than 2^31 characters cannot yet be handled; and where negative offsets are to be treated as zero
        Parameters:
        value - the actual value of a character position within a string, or the length of a string
        Returns:
        the value as an integer if it is within range
        Throws:
        java.lang.UnsupportedOperationException - if the supplied value exceeds Integer.MAX_VALUE