Class UnicodeBuilder

java.lang.Object
java.io.Writer
net.sf.saxon.str.UnicodeBuilder
All Implemented Interfaces:
Closeable, Flushable, Appendable, AutoCloseable, UnicodeWriter, UniStringConsumer

public final class UnicodeBuilder extends Writer implements UniStringConsumer, UnicodeWriter
Builder class to construct a UnicodeString by appending text incrementally
  • Constructor Details

    • UnicodeBuilder

      public UnicodeBuilder()
      Create a Unicode builder with an initial allocation of 16 codepoints
    • UnicodeBuilder

      public UnicodeBuilder(int allocate)
      Create a Unicode builder with an initial space allocation
      Parameters:
      allocate - the initial space allocation, in codepoints (32-bit integers)
  • Method Details

    • append

      public UnicodeBuilder append(char ch)
      Append a character, which must not be a surrogate. (Method needed for C#, because implicit conversion of char to int isn't supported)
      Specified by:
      append in interface Appendable
      Overrides:
      append in class Writer
      Parameters:
      ch - the character
      Returns:
      this builder, with the new character added
    • append

      public UnicodeBuilder append(int codePoint)
      Append a single unicode character to the content
      Parameters:
      codePoint - the unicode codepoint. The caller is responsible for ensuring that this is not a surrogate. (In fact, some callers, such as the JSON parser, do in fact append unpaired surrogates to the builder, and sort it out later.)
      Returns:
      this builder, with the new character added
    • append

      public UnicodeBuilder append(IntIterator codePoints)
      Append multiple unicode characters to the content
      Parameters:
      codePoints - an iterator delivering the codepoints to be added.
      Returns:
      this builder, with the new characters added
    • appendLatin

      public UnicodeBuilder appendLatin(String str)
      Append a Java string to the content. The caller is responsible for ensuring that this consists entirely of characters in the Latin-1 character set
      Parameters:
      str - the string to be appended
      Returns:
      this builder, with the new string added
    • append

      public UnicodeBuilder append(CharSequence str)
      Append a Java CharSequence to the content. This may contain arbitrary characters including well formed surrogate pairs
      Specified by:
      append in interface Appendable
      Overrides:
      append in class Writer
      Parameters:
      str - the string to be appended
      Returns:
      this builder, with the new string added
    • append

      public UnicodeBuilder append(UnicodeString str)
      Append a UnicodeString object to the content.
      Parameters:
      str - the string to be appended. The length is currently restricted to 2^31.
      Returns:
      this builder, with the new string added
    • length

      public long length()
      Get the number of codepoints currently in the builder
      Returns:
      the size in codepoints
    • isEmpty

      public boolean isEmpty()
      Ask whether the content of the builder is empty
      Returns:
      true if the size is zero
    • toUnicodeString

      public UnicodeString toUnicodeString()
      Construct a UnicodeString whose value is formed from the contents of this builder
      Returns:
      the constructed UnicodeString
    • toStringItem

      public StringValue toStringItem(AtomicType type)
      Construct a StringValue whose value is formed from the contents of this builder
      Parameters:
      type - the required type, for example BuiltInAtomicType.STRING or BuiltInAtomicType.UNTYPED_ATOMIC. The caller warrants that the value is a valid instance of this type. No validation or whitespace normalization is carried out
      Returns:
      the constructed StringValue
    • toString

      public String toString()
      Return a string containing the character content of this builder
      Overrides:
      toString in class Object
      Returns:
      the character content of this builder as a Java String
    • clear

      public void clear()
      Reset the contents of this builder to be empty
    • expand1to2

      public static byte[] expand1to2(byte[] in, int start, int used, int allocate)
      Expand a byte array from 1-byte-per-character to 2-bytes-per-character
      Parameters:
      in - the input byte array
      start - the start offset in bytes
      used - the end offset in bytes
      allocate - the number of code points to allow for in the output byte array
      Returns:
      the new byte array
    • expandBytesToChars

      public static char[] expandBytesToChars(byte[] in, int start, int end)
    • expand1to3

      public static byte[] expand1to3(byte[] in, int start, int used, int allocate)
      Expand a byte array from 1-byte-per-character to 3-bytes-per-character
      Parameters:
      in - the input byte array
      start - the start offset in bytes
      used - the end offset in bytes
      allocate - the number of code points to allow for in the output byte array
      Returns:
      the new byte array
    • expand2to3

      public static byte[] expand2to3(byte[] in, int start, int used, int allocate)
      Expand a byte array from 2-bytes-per-character to 3-bytes-per-character
      Parameters:
      in - the input byte array
      start - the start offset in bytes
      used - the end offset in bytes
      allocate - the number of code points to allow for in the output byte array
      Returns:
      the new byte array
    • expand

      public static byte[] expand(byte[] in, int start, int end, int oldWidth, int newWidth, int allocate)
      Expand the width of the characters in a byte array
      Parameters:
      in - the input byte array
      start - the start offset in bytes
      end - the end offset in bytes
      oldWidth - the width of the characters (number of bytes per character) in the input array
      newWidth - the width of the characters (number of bytes per character) in the output array. If newWidth LE oldWidth then the input array is copied; the width is never reduced
      allocate - the number of code points to allow for in the output byte array; if zero (or insufficient) the output array will have no spare space for expansion
      Returns:
      the new byte array
    • accept

      public UnicodeBuilder accept(UnicodeString chars)
      Process a supplied string
      Specified by:
      accept in interface UniStringConsumer
      Parameters:
      chars - the characters to be processed
      Returns:
      this CharSequenceConsumer (to allow method chaining)
    • write

      public void write(UnicodeString chars)
      Description copied from interface: UnicodeWriter
      Process a supplied string
      Specified by:
      write in interface UnicodeWriter
      Parameters:
      chars - the characters to be processed
    • writeAscii

      public void writeAscii(byte[] content) throws IOException
      Write a supplied string known to consist entirely of ASCII characters, supplied as a byte array
      Specified by:
      writeAscii in interface UnicodeWriter
      Parameters:
      content - byte array holding ASCII characters only
      Throws:
      IOException - if processing fails for any reason
    • writeCodePoint

      public void writeCodePoint(int codepoint) throws IOException
      Process a single character.
      Specified by:
      writeCodePoint in interface UnicodeWriter
      Parameters:
      codepoint - the Unicode character to be processed. Must not be a surrogate
      Throws:
      IOException - if processing fails for any reason
    • writeAscii

      public void writeAscii(int codepoint)
      Process a single ASCII character.
      Specified by:
      writeAscii in interface UnicodeWriter
      Parameters:
      codepoint - the Unicode character to be processed. Must be in the range 0-127; this is not necessarily checked
      Throws:
      IOException - if processing fails for any reason
    • write

      public void write(String chars) throws IOException
      Process a supplied string
      Specified by:
      write in interface UnicodeWriter
      Overrides:
      write in class Writer
      Parameters:
      chars - the characters to be processed
      Throws:
      IOException - if processing fails for any reason
    • write

      public void write(char[] cbuf, int off, int len) throws IOException
      Specified by:
      write in class Writer
      Throws:
      IOException
    • flush

      public void flush() throws IOException
      Description copied from interface: UnicodeWriter
      Flush the contents of any buffers. The default implementation does nothing.
      Specified by:
      flush in interface Flushable
      Specified by:
      flush in interface UnicodeWriter
      Specified by:
      flush in class Writer
      Throws:
      IOException - if processing fails for any reason
    • close

      public void close()
      Complete the writing of characters to the result. The default implementation does nothing.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Specified by:
      close in interface UnicodeWriter
      Specified by:
      close in interface UniStringConsumer
      Specified by:
      close in class Writer