Class UnicodeString

  • All Implemented Interfaces:
    java.lang.CharSequence, java.lang.Comparable<UnicodeString>, AtomicMatchKey
    Direct Known Subclasses:
    BMPString, EmptyString, GeneralUnicodeString, LatinString

    public abstract class UnicodeString
    extends java.lang.Object
    implements java.lang.CharSequence, java.lang.Comparable<UnicodeString>, AtomicMatchKey
    An abstract class that efficiently handles Unicode strings including non-BMP characters; it has three subclasses, respectively handling strings whose maximum character code is 255, 65535, or 1114111.
    • Constructor Summary

      Constructors 
      Constructor Description
      UnicodeString()  
    • Method Summary

      All Methods Static Methods Instance Methods Abstract Methods Concrete Methods 
      Modifier and Type Method Description
      AtomicValue asAtomic()
      Get an atomic value that encapsulates this match key.
      int compareTo​(UnicodeString other)
      Compare two unicode strings in codepoint collating sequence
      static boolean containsSurrogatePairs​(java.lang.CharSequence value)
      Test whether a CharSequence contains Unicode codepoints outside the BMP range
      boolean equals​(java.lang.Object obj)
      Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence
      int hashCode()
      Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence
      abstract boolean isEnd​(int pos)
      Ask whether a given position is at (or beyond) the end of the string
      static UnicodeString makeUnicodeString​(int[] in)
      Make a UnicodeString for a given array of codepoints
      static UnicodeString makeUnicodeString​(java.lang.CharSequence in)
      Make a UnicodeString for a given CharSequence
      abstract int uCharAt​(int pos)
      Get the character at a specified position
      abstract int uIndexOf​(int search, int start)
      Get the first match for a given character
      abstract int uLength()
      Get the length of the string, in Unicode codepoints
      abstract UnicodeString uSubstring​(int beginIndex, int endIndex)
      Get a substring of this string
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface java.lang.CharSequence

        charAt, chars, codePoints, length, subSequence, toString
    • Constructor Detail

      • UnicodeString

        public UnicodeString()
    • Method Detail

      • makeUnicodeString

        public static UnicodeString makeUnicodeString​(java.lang.CharSequence in)
        Make a UnicodeString for a given CharSequence
        Parameters:
        in - the input CharSequence
        Returns:
        a UnicodeString using an appropriate implementation class
      • makeUnicodeString

        public static UnicodeString makeUnicodeString​(int[] in)
        Make a UnicodeString for a given array of codepoints
        Parameters:
        in - the input CharSequence
        Returns:
        a UnicodeString using an appropriate implementation class
      • containsSurrogatePairs

        public static boolean containsSurrogatePairs​(java.lang.CharSequence value)
        Test whether a CharSequence contains Unicode codepoints outside the BMP range
        Parameters:
        value - the string to be tested
        Returns:
        true if the string contains non-BMP codepoints
      • uSubstring

        public abstract UnicodeString uSubstring​(int beginIndex,
                                                 int endIndex)
        Get a substring of this string
        Parameters:
        beginIndex - the index of the first character to be included (counting codepoints, not 16-bit characters)
        endIndex - the index of the first character to be NOT included (counting codepoints, not 16-bit characters)
        Returns:
        a substring
        Throws:
        java.lang.IndexOutOfBoundsException - if the selection goes off the start or end of the string (this function follows the semantics of String.substring(), not the XPath semantics)
      • uIndexOf

        public abstract int uIndexOf​(int search,
                                     int start)
        Get the first match for a given character
        Parameters:
        search - the character to look for
        start - the first position to look
        Returns:
        the position of the first occurrence of the sought character, or -1 if not found
      • uCharAt

        public abstract int uCharAt​(int pos)
        Get the character at a specified position
        Parameters:
        pos - the index of the required character (counting codepoints, not 16-bit characters)
        Returns:
        a character (Unicode codepoint) at the specified position.
      • uLength

        public abstract int uLength()
        Get the length of the string, in Unicode codepoints
        Returns:
        the number of codepoints in the string
      • isEnd

        public abstract boolean isEnd​(int pos)
        Ask whether a given position is at (or beyond) the end of the string
        Parameters:
        pos - the index of the required character (counting codepoints, not 16-bit characters)
        Returns:
        true iff if the specified index is after the end of the character stream
      • hashCode

        public int hashCode()
        Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence
        Overrides:
        hashCode in class java.lang.Object
        Returns:
        a hashCode that distinguishes this UnicodeString from others
      • equals

        public boolean equals​(java.lang.Object obj)
        Implementations of UnicodeString can be compared with each other, but not with other implementations of CharSequence
        Overrides:
        equals in class java.lang.Object
        Parameters:
        obj - the object to be compared
        Returns:
        true if obj is a UnicodeString containing the same codepoints
      • compareTo

        public int compareTo​(UnicodeString other)
        Compare two unicode strings in codepoint collating sequence
        Specified by:
        compareTo in interface java.lang.Comparable<UnicodeString>
        Parameters:
        other - the object to be compared
        Returns:
        less than 0, 0, or greater than 0 depending on the ordering of the two strings
      • asAtomic

        public AtomicValue asAtomic()
        Get an atomic value that encapsulates this match key. Needed to support the collation-key() function.
        Specified by:
        asAtomic in interface AtomicMatchKey
        Returns:
        an atomic value that encapsulates this match key