Class UTF8CharacterSet

java.lang.Object
net.sf.saxon.serialize.charcode.UTF8CharacterSet
All Implemented Interfaces:
CharacterSet

public final class UTF8CharacterSet extends Object implements CharacterSet
This class defines properties of the UTF-8 character set
  • Method Details

    • getInstance

      public static UTF8CharacterSet getInstance()
      Get the singular instance of this class
      Returns:
      the singular instance of this class
    • inCharset

      public boolean inCharset(int c)
      Description copied from interface: CharacterSet
      Determine if a character is present in the character set
      Specified by:
      inCharset in interface CharacterSet
      Parameters:
      c - the codepoint being tested
      Returns:
      true if the codepoint is supported
    • getCanonicalName

      public String getCanonicalName()
      Description copied from interface: CharacterSet
      Get the preferred Java name of the character set. Note that Java in many cases also supports a "historic name".
      Specified by:
      getCanonicalName in interface CharacterSet
      Returns:
      the preferred Java name
    • getUTF8Encoding

      public static int getUTF8Encoding(char in, char in2, byte[] out)
      Static method to generate the UTF-8 representation of a Unicode character
      Parameters:
      in - the Unicode character, or the high half of a surrogate pair
      in2 - the low half of a surrogate pair (ignored unless the first argument is in the range for a surrogate pair)
      out - an array of at least 4 bytes to hold the UTF-8 representation.
      Returns:
      the number of bytes in the UTF-8 representation
    • encode

      public static byte[] encode(IntIterator codePoints)
      Static method to generate the UTF-8 representation of a sequence of Unicode codepoints
      Parameters:
      codePoints - the sequence of Unicode codepoints: must not include surrogates
      Returns:
      the UTF-8 encoding of the characters
    • decodeUTF8

      public static int decodeUTF8(byte[] in, int used) throws IllegalArgumentException
      Decode a UTF8 character
      Parameters:
      in - array of bytes representing a single UTF-8 encoded character
      used - number of bytes in the array that are actually used
      Returns:
      the Unicode codepoint of this character
      Throws:
      IllegalArgumentException - if the byte sequence is not a valid UTF-8 representation