Class UTF8CharacterSet

  • All Implemented Interfaces:
    CharacterSet

    public final class UTF8CharacterSet
    extends java.lang.Object
    implements CharacterSet
    This class defines properties of the UTF-8 character set
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static int decodeUTF8​(byte[] in, int used)
      Decode a UTF8 character
      static byte[] encode​(IntIterator codePoints)
      Static method to generate the UTF-8 representation of a sequence of Unicode codepoints
      java.lang.String getCanonicalName()
      Get the preferred Java name of the character set.
      static UTF8CharacterSet getInstance()
      Get the singular instance of this class
      static int getUTF8Encoding​(char in, char in2, byte[] out)
      Static method to generate the UTF-8 representation of a Unicode character
      boolean inCharset​(int c)
      Determine if a character is present in the character set
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • getInstance

        public static UTF8CharacterSet getInstance()
        Get the singular instance of this class
        Returns:
        the singular instance of this class
      • inCharset

        public boolean inCharset​(int c)
        Description copied from interface: CharacterSet
        Determine if a character is present in the character set
        Specified by:
        inCharset in interface CharacterSet
        Parameters:
        c - the codepoint being tested
        Returns:
        true if the codepoint is supported
      • getCanonicalName

        public java.lang.String getCanonicalName()
        Description copied from interface: CharacterSet
        Get the preferred Java name of the character set. Note that Java in many cases also supports a "historic name".
        Specified by:
        getCanonicalName in interface CharacterSet
        Returns:
        the preferred Java name
      • getUTF8Encoding

        public static int getUTF8Encoding​(char in,
                                          char in2,
                                          byte[] out)
        Static method to generate the UTF-8 representation of a Unicode character
        Parameters:
        in - the Unicode character, or the high half of a surrogate pair
        in2 - the low half of a surrogate pair (ignored unless the first argument is in the range for a surrogate pair)
        out - an array of at least 4 bytes to hold the UTF-8 representation.
        Returns:
        the number of bytes in the UTF-8 representation
      • encode

        public static byte[] encode​(IntIterator codePoints)
        Static method to generate the UTF-8 representation of a sequence of Unicode codepoints
        Parameters:
        codePoints - the sequence of Unicode codepoints: must not include surrogates
        Returns:
        the UTF-8 encoding of the characters
      • decodeUTF8

        public static int decodeUTF8​(byte[] in,
                                     int used)
                              throws java.lang.IllegalArgumentException
        Decode a UTF8 character
        Parameters:
        in - array of bytes representing a single UTF-8 encoded character
        used - number of bytes in the array that are actually used
        Returns:
        the Unicode codepoint of this character
        Throws:
        java.lang.IllegalArgumentException - if the byte sequence is not a valid UTF-8 representation