net.sf.saxon.charcode
Class UTF8CharacterSet

java.lang.Object
  extended by net.sf.saxon.charcode.UTF8CharacterSet
All Implemented Interfaces:
CharacterSet

public final class UTF8CharacterSet
extends Object
implements CharacterSet

This class defines properties of the UTF-8 character set


Method Summary
static int decodeUTF8(byte[] in, int used)
          Decode a UTF8 character
 String getCanonicalName()
          Get the preferred Java name of the character set.
static UTF8CharacterSet getInstance()
          Get the singular instance of this class
static int getUTF8Encoding(char in, char in2, byte[] out)
          Static method to generate the UTF-8 representation of a Unicode character
 boolean inCharset(int c)
          Determine if a character is present in the character set
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getInstance

public static UTF8CharacterSet getInstance()
Get the singular instance of this class

Returns:
the singular instance of this classthe singular instance of this class

inCharset

public boolean inCharset(int c)
Description copied from interface: CharacterSet
Determine if a character is present in the character set

Specified by:
inCharset in interface CharacterSet

getCanonicalName

public String getCanonicalName()
Description copied from interface: CharacterSet
Get the preferred Java name of the character set. Note that Java in many cases also supports a "historic name".

Specified by:
getCanonicalName in interface CharacterSet

getUTF8Encoding

public static int getUTF8Encoding(char in,
                                  char in2,
                                  byte[] out)
Static method to generate the UTF-8 representation of a Unicode character

Parameters:
in - the Unicode character, or the high half of a surrogate pair
in2 - the low half of a surrogate pair (ignored unless the first argument is in the range for a surrogate pair)
out - an array of at least 4 bytes to hold the UTF-8 representation.
Returns:
the number of bytes in the UTF-8 representation

decodeUTF8

public static int decodeUTF8(byte[] in,
                             int used)
                      throws IllegalArgumentException
Decode a UTF8 character

Parameters:
in - array of bytes representing a single UTF-8 encoded character
used - number of bytes in the array that are actually used
Returns:
the Unicode codepoint of this character
Throws:
IllegalArgumentException - if the byte sequence is not a valid UTF-8 representation


Copyright (c) Saxonica Limited. All rights reserved.