Class CompressedWhitespace

All Implemented Interfaces:
Comparable<UnicodeString>, AtomicMatchKey

public class CompressedWhitespace extends WhitespaceString
This class provides a compressed representation of a sequence of whitespace characters. The representation is a sequence of bytes: in each byte the top two bits indicate which whitespace character is used (x9, xA, xD, or x20) and the bottom six bits indicate the number of such characters. A zero byte is a filler. We don't compress the sequence if it would occupy more than 8 bytes, because that's the space we've got available in the TinyTree arrays.
  • Constructor Details

    • CompressedWhitespace

      public CompressedWhitespace(long compressedValue)
  • Method Details

    • compressWS

      public static UnicodeString compressWS(char[] in, int start, int len)
    • uncompress

      public UnicodeString uncompress()
      Uncompress the whitespace to a (normal) UnicodeString
      Specified by:
      uncompress in class WhitespaceString
      Returns:
      the uncompressed value
    • uncompress

      public static UnicodeString uncompress(long value)
    • getCompressedValue

      public long getCompressedValue()
    • length

      public long length()
      Description copied from class: UnicodeString
      Get the length of the string
      Specified by:
      length in class UnicodeString
      Returns:
      the number of code points in the string
    • length32

      public int length32()
      Description copied from class: UnicodeString
      Get the length of the string, provided it is less than 2^31 characters
      Overrides:
      length32 in class UnicodeString
      Returns:
      the length of the string if it fits within a Java int
    • length

      public static int length(long value)
    • codePointAt

      public int codePointAt(long index)
      Get the code point at a given position in the string
      Specified by:
      codePointAt in class UnicodeString
      Parameters:
      index - the given position (0-based)
      Returns:
      the code point at the given position
      Throws:
      IndexOutOfBoundsException - if the index is out of range
    • codePoints

      public IntIterator codePoints()
      Description copied from class: UnicodeString
      Get an iterator over the code points present in the string.
      Specified by:
      codePoints in class UnicodeString
      Returns:
      an iterator that delivers the individual code points
    • equals

      public boolean equals(Object obj)
      Indicates whether some other object is "equal to" this one.
      Overrides:
      equals in class UnicodeString
    • hashCode

      public int hashCode()
      Description copied from class: UnicodeString
      Compute a hashCode. All implementations of UnicodeString use compatible hash codes and the hashing algorithm is therefore identical to that for java.lang.String. This means that for strings containing Astral characters, the hash code needs to be computed by decomposing an Astral character into a surrogate pair.
      Overrides:
      hashCode in class UnicodeString
      Returns:
      the hash code
    • write

      public void write(UnicodeWriter writer) throws IOException
      Write the value to a Writer
      Specified by:
      write in class WhitespaceString
      Parameters:
      writer - the writer to write to
      Throws:
      IOException - if an error occurs downstream
    • writeEscape

      public void writeEscape(boolean[] specialChars, UnicodeWriter writer) throws IOException
      Write the value to a Writer with escaping of special characters
      Specified by:
      writeEscape in class WhitespaceString
      Parameters:
      specialChars - identifies which characters are considered special
      writer - the writer to write to
      Throws:
      IOException - if an error occurs downstream