Class UnicodeBuilder

java.lang.Object
net.sf.saxon.str.UnicodeBuilder
All Implemented Interfaces:
UnicodeWriter, UniStringConsumer

public final class UnicodeBuilder extends Object implements UniStringConsumer, UnicodeWriter
Builder class to construct a UnicodeString by appending text incrementally
  • Constructor Details

    • UnicodeBuilder

      public UnicodeBuilder()
      Create a Unicode builder with an initial allocation of 256 codepoints
    • UnicodeBuilder

      public UnicodeBuilder(int allocate)
      Create a Unicode builder with an initial space allocation
      Parameters:
      allocate - the initial space allocation, in codepoints (32-bit integers)
  • Method Details

    • append

      public UnicodeBuilder append(char ch)
      Append a character, which must not be a surrogate. (Method needed for C#, because implicit conversion of char to int isn't supported)
      Parameters:
      ch - the character
      Returns:
      this builder, with the new character added
    • append

      public UnicodeBuilder append(int codePoint)
      Append a single unicode character to the content
      Parameters:
      codePoint - the unicode codepoint. The caller is responsible for ensuring that this is not a surrogate
      Returns:
      this builder, with the new character added
    • append

      public UnicodeBuilder append(IntIterator codePoints)
      Append multiple unicode characters to the content
      Parameters:
      codePoints - an iterator delivering the codepoints to be added.
      Returns:
      this builder, with the new characters added
    • appendLatin

      public UnicodeBuilder appendLatin(String str)
      Append a Java string to the content. The caller is responsible for ensuring that this consists entirely of characters in the Latin-1 character set
      Parameters:
      str - the string to be appended
      Returns:
      this builder, with the new string added
    • appendAll

      public UnicodeBuilder appendAll(SequenceIterator iter)
      Append the string values of all the items in a sequence, with no separator
      Parameters:
      iter - the sequence of items
      Returns:
      this builder, with the new items added
    • append

      public UnicodeBuilder append(CharSequence str)
      Append a Java CharSequence to the content. This may contain arbitrary characters including well formed surrogate pairs
      Parameters:
      str - the string to be appended
      Returns:
      this builder, with the new string added
    • append

      public UnicodeBuilder append(UnicodeString str)
      Append a UnicodeString object to the content.
      Parameters:
      str - the string to be appended. The length is currently restricted to 2^31.
      Returns:
      this builder, with the new string added
    • length

      public long length()
      Get the number of codepoints currently in the builder
      Returns:
      the size in codepoints
    • isEmpty

      public boolean isEmpty()
      Ask whether the content of the builder is empty
      Returns:
      true if the size is zero
    • toUnicodeString

      public UnicodeString toUnicodeString()
      Construct a UnicodeString whose value is formed from the contents of this builder
      Returns:
      the constructed UnicodeString
    • toStringItem

      public StringValue toStringItem(AtomicType type)
      Construct a StringValue whose value is formed from the contents of this builder
      Parameters:
      type - the required type, for example BuiltInAtomicType.STRING or BuiltInAtomicType.UNTYPED_ATOMIC. The caller warrants that the value is a valid instance of this type. No validation or whitespace normalization is carried out
      Returns:
      the constructed StringValue
    • toString

      public String toString()
      Return a string containing the character content of this builder
      Overrides:
      toString in class Object
      Returns:
      the character content of this builder as a Java String
    • clear

      public void clear()
      Reset the contents of this builder to be empty
    • expand1to2

      public static byte[] expand1to2(byte[] in, int start, int used, int allocate)
      Expand a byte array from 1-byte-per-character to 2-bytes-per-character
      Parameters:
      in - the input byte array
      start - the start offset in bytes
      used - the end offset in bytes
      allocate - the number of code points to allow for in the output byte array
      Returns:
      the new byte array
    • expandBytesToChars

      public static char[] expandBytesToChars(byte[] in, int start, int end)
    • expand1to3

      public static byte[] expand1to3(byte[] in, int start, int used, int allocate)
      Expand a byte array from 1-byte-per-character to 3-bytes-per-character
      Parameters:
      in - the input byte array
      start - the start offset in bytes
      used - the end offset in bytes
      allocate - the number of code points to allow for in the output byte array
      Returns:
      the new byte array
    • expand2to3

      public static byte[] expand2to3(byte[] in, int start, int used, int allocate)
      Expand a byte array from 2-bytes-per-character to 3-bytes-per-character
      Parameters:
      in - the input byte array
      start - the start offset in bytes
      used - the end offset in bytes
      allocate - the number of code points to allow for in the output byte array
      Returns:
      the new byte array
    • expand

      public static byte[] expand(byte[] in, int start, int end, int oldWidth, int newWidth, int allocate)
      Expand the width of the characters in a byte array
      Parameters:
      in - the input byte array
      start - the start offset in bytes
      end - the end offset in bytes
      oldWidth - the width of the characters (number of bytes per character) in the input array
      newWidth - the width of the characters (number of bytes per character) in the output array. If newWidth LE oldWidth then the input array is copied; the width is never reduced
      allocate - the number of code points to allow for in the output byte array; if zero (or insufficient) the output array will have no spare space for expansion
      Returns:
      the new byte array
    • accept

      public UnicodeBuilder accept(UnicodeString chars)
      Process a supplied string
      Specified by:
      accept in interface UniStringConsumer
      Parameters:
      chars - the characters to be processed
      Returns:
      this CharSequenceConsumer (to allow method chaining)
    • write

      public void write(UnicodeString chars)
      Description copied from interface: UnicodeWriter
      Process a supplied string
      Specified by:
      write in interface UnicodeWriter
      Parameters:
      chars - the characters to be processed
    • writeAscii

      public void writeAscii(byte[] content) throws IOException
      Write a supplied string known to consist entirely of ASCII characters, supplied as a byte array
      Specified by:
      writeAscii in interface UnicodeWriter
      Parameters:
      content - byte array holding ASCII characters only
      Throws:
      IOException - if processing fails for any reason
    • write

      public void write(String chars) throws IOException
      Process a supplied string
      Specified by:
      write in interface UnicodeWriter
      Parameters:
      chars - the characters to be processed
      Throws:
      IOException - if processing fails for any reason
    • trimToSize

      public void trimToSize()
    • close

      public void close()
      Complete the writing of characters to the result. The default implementation does nothing.
      Specified by:
      close in interface UnicodeWriter
      Specified by:
      close in interface UniStringConsumer