Class CompressedWhitespace

  • All Implemented Interfaces:
    java.lang.Comparable<UnicodeString>, AtomicMatchKey

    public class CompressedWhitespace
    extends WhitespaceString
    This class provides a compressed representation of a sequence of whitespace characters. The representation is a sequence of bytes: in each byte the top two bits indicate which whitespace character is used (x9, xA, xD, or x20) and the bottom six bits indicate the number of such characters. A zero byte is a filler. We don't compress the sequence if it would occupy more than 8 bytes, because that's the space we've got available in the TinyTree arrays.
    • Constructor Detail

      • CompressedWhitespace

        public CompressedWhitespace​(long compressedValue)
    • Method Detail

      • compressWS

        public static UnicodeString compressWS​(char[] in,
                                               int start,
                                               int len)
      • uncompress

        public static UnicodeString uncompress​(long value)
      • getCompressedValue

        public long getCompressedValue()
      • length

        public long length()
        Description copied from class: UnicodeString
        Get the length of the string
        Specified by:
        length in class UnicodeString
        Returns:
        the number of code points in the string
      • length32

        public int length32()
        Description copied from class: UnicodeString
        Get the length of the string, provided it is less than 2^31 characters
        Overrides:
        length32 in class UnicodeString
        Returns:
        the length of the string if it fits within a Java int
      • length

        public static int length​(long value)
      • codePointAt

        public int codePointAt​(long index)
        Get the code point at a given position in the string
        Specified by:
        codePointAt in class UnicodeString
        Parameters:
        index - the given position (0-based)
        Returns:
        the code point at the given position
        Throws:
        java.lang.IndexOutOfBoundsException - if the index is out of range
      • codePoints

        public IntIterator codePoints()
        Description copied from class: UnicodeString
        Get an iterator over the code points present in the string.
        Specified by:
        codePoints in class UnicodeString
        Returns:
        an iterator that delivers the individual code points
      • equals

        public boolean equals​(java.lang.Object obj)
        Indicates whether some other object is "equal to" this one.
        Overrides:
        equals in class UnicodeString
      • hashCode

        public int hashCode()
        Description copied from class: UnicodeString
        Compute a hashCode. All implementations of UnicodeString use compatible hash codes and the hashing algorithm is therefore identical to that for java.lang.String. This means that for strings containing Astral characters, the hash code needs to be computed by decomposing an Astral character into a surrogate pair.
        Overrides:
        hashCode in class UnicodeString
        Returns:
        the hash code
      • write

        public void write​(UnicodeWriter writer)
                   throws java.io.IOException
        Write the value to a Writer
        Specified by:
        write in class WhitespaceString
        Parameters:
        writer - the writer to write to
        Throws:
        java.io.IOException - if an error occurs downstream
      • writeEscape

        public void writeEscape​(boolean[] specialChars,
                                UnicodeWriter writer)
                         throws java.io.IOException
        Write the value to a Writer with escaping of special characters
        Specified by:
        writeEscape in class WhitespaceString
        Parameters:
        specialChars - identifies which characters are considered special
        writer - the writer to write to
        Throws:
        java.io.IOException - if an error occurs downstream