Class UnicodeBuilder

    • Constructor Summary

      Constructors 
      Constructor Description
      UnicodeBuilder()
      Create a Unicode builder with an initial allocation of 256 codepoints
      UnicodeBuilder​(int allocate)
      Create a Unicode builder with an initial space allocation
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      UnicodeBuilder accept​(UnicodeString chars)
      Process a supplied string
      UnicodeBuilder append​(char ch)
      Append a character, which must not be a surrogate.
      UnicodeBuilder append​(int codePoint)
      Append a single unicode character to the content
      UnicodeBuilder append​(java.lang.CharSequence str)
      Append a Java CharSequence to the content.
      UnicodeBuilder append​(UnicodeString str)
      Append a UnicodeString object to the content.
      UnicodeBuilder append​(IntIterator codePoints)
      Append multiple unicode characters to the content
      UnicodeBuilder appendAll​(SequenceIterator iter)
      Append the string values of all the items in a sequence, with no separator
      UnicodeBuilder appendLatin​(java.lang.String str)
      Append a Java string to the content.
      void clear()
      Reset the contents of this builder to be empty
      void close()
      Complete the writing of characters to the result.
      static byte[] expand​(byte[] in, int start, int end, int oldWidth, int newWidth, int allocate)
      Expand the width of the characters in a byte array
      static byte[] expand1to2​(byte[] in, int start, int used, int allocate)
      Expand a byte array from 1-byte-per-character to 2-bytes-per-character
      static byte[] expand1to3​(byte[] in, int start, int used, int allocate)
      Expand a byte array from 1-byte-per-character to 3-bytes-per-character
      static byte[] expand2to3​(byte[] in, int start, int used, int allocate)
      Expand a byte array from 2-bytes-per-character to 3-bytes-per-character
      static char[] expandBytesToChars​(byte[] in, int start, int end)  
      boolean isEmpty()
      Ask whether the content of the builder is empty
      long length()
      Get the number of codepoints currently in the builder
      java.lang.String toString()
      Return a string containing the character content of this builder
      StringValue toStringItem​(AtomicType type)
      Construct a StringValue whose value is formed from the contents of this builder
      UnicodeString toUnicodeString()
      Construct a UnicodeString whose value is formed from the contents of this builder
      void trimToSize()  
      void write​(java.lang.String chars)
      Process a supplied string
      void write​(UnicodeString chars)
      Process a supplied string
      void writeAscii​(byte[] content)
      Write a supplied string known to consist entirely of ASCII characters, supplied as a byte array
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • UnicodeBuilder

        public UnicodeBuilder()
        Create a Unicode builder with an initial allocation of 256 codepoints
      • UnicodeBuilder

        public UnicodeBuilder​(int allocate)
        Create a Unicode builder with an initial space allocation
        Parameters:
        allocate - the initial space allocation, in codepoints (32-bit integers)
    • Method Detail

      • append

        public UnicodeBuilder append​(char ch)
        Append a character, which must not be a surrogate. (Method needed for C#, because implicit conversion of char to int isn't supported)
        Parameters:
        ch - the character
        Returns:
        this builder, with the new character added
      • append

        public UnicodeBuilder append​(int codePoint)
        Append a single unicode character to the content
        Parameters:
        codePoint - the unicode codepoint. The caller is responsible for ensuring that this is not a surrogate
        Returns:
        this builder, with the new character added
      • append

        public UnicodeBuilder append​(IntIterator codePoints)
        Append multiple unicode characters to the content
        Parameters:
        codePoints - an iterator delivering the codepoints to be added.
        Returns:
        this builder, with the new characters added
      • appendLatin

        public UnicodeBuilder appendLatin​(java.lang.String str)
        Append a Java string to the content. The caller is responsible for ensuring that this consists entirely of characters in the Latin-1 character set
        Parameters:
        str - the string to be appended
        Returns:
        this builder, with the new string added
      • appendAll

        public UnicodeBuilder appendAll​(SequenceIterator iter)
        Append the string values of all the items in a sequence, with no separator
        Parameters:
        iter - the sequence of items
        Returns:
        this builder, with the new items added
      • append

        public UnicodeBuilder append​(java.lang.CharSequence str)
        Append a Java CharSequence to the content. This may contain arbitrary characters including well formed surrogate pairs
        Parameters:
        str - the string to be appended
        Returns:
        this builder, with the new string added
      • append

        public UnicodeBuilder append​(UnicodeString str)
        Append a UnicodeString object to the content.
        Parameters:
        str - the string to be appended. The length is currently restricted to 2^31.
        Returns:
        this builder, with the new string added
      • length

        public long length()
        Get the number of codepoints currently in the builder
        Returns:
        the size in codepoints
      • isEmpty

        public boolean isEmpty()
        Ask whether the content of the builder is empty
        Returns:
        true if the size is zero
      • toUnicodeString

        public UnicodeString toUnicodeString()
        Construct a UnicodeString whose value is formed from the contents of this builder
        Returns:
        the constructed UnicodeString
      • toStringItem

        public StringValue toStringItem​(AtomicType type)
        Construct a StringValue whose value is formed from the contents of this builder
        Parameters:
        type - the required type, for example BuiltInAtomicType.STRING or BuiltInAtomicType.UNTYPED_ATOMIC. The caller warrants that the value is a valid instance of this type. No validation or whitespace normalization is carried out
        Returns:
        the constructed StringValue
      • toString

        public java.lang.String toString()
        Return a string containing the character content of this builder
        Overrides:
        toString in class java.lang.Object
        Returns:
        the character content of this builder as a Java String
      • clear

        public void clear()
        Reset the contents of this builder to be empty
      • expand1to2

        public static byte[] expand1to2​(byte[] in,
                                        int start,
                                        int used,
                                        int allocate)
        Expand a byte array from 1-byte-per-character to 2-bytes-per-character
        Parameters:
        in - the input byte array
        start - the start offset in bytes
        used - the end offset in bytes
        allocate - the number of code points to allow for in the output byte array
        Returns:
        the new byte array
      • expandBytesToChars

        public static char[] expandBytesToChars​(byte[] in,
                                                int start,
                                                int end)
      • expand1to3

        public static byte[] expand1to3​(byte[] in,
                                        int start,
                                        int used,
                                        int allocate)
        Expand a byte array from 1-byte-per-character to 3-bytes-per-character
        Parameters:
        in - the input byte array
        start - the start offset in bytes
        used - the end offset in bytes
        allocate - the number of code points to allow for in the output byte array
        Returns:
        the new byte array
      • expand2to3

        public static byte[] expand2to3​(byte[] in,
                                        int start,
                                        int used,
                                        int allocate)
        Expand a byte array from 2-bytes-per-character to 3-bytes-per-character
        Parameters:
        in - the input byte array
        start - the start offset in bytes
        used - the end offset in bytes
        allocate - the number of code points to allow for in the output byte array
        Returns:
        the new byte array
      • expand

        public static byte[] expand​(byte[] in,
                                    int start,
                                    int end,
                                    int oldWidth,
                                    int newWidth,
                                    int allocate)
        Expand the width of the characters in a byte array
        Parameters:
        in - the input byte array
        start - the start offset in bytes
        end - the end offset in bytes
        oldWidth - the width of the characters (number of bytes per character) in the input array
        newWidth - the width of the characters (number of bytes per character) in the output array. If newWidth LE oldWidth then the input array is copied; the width is never reduced
        allocate - the number of code points to allow for in the output byte array; if zero (or insufficient) the output array will have no spare space for expansion
        Returns:
        the new byte array
      • accept

        public UnicodeBuilder accept​(UnicodeString chars)
        Process a supplied string
        Specified by:
        accept in interface UniStringConsumer
        Parameters:
        chars - the characters to be processed
        Returns:
        this CharSequenceConsumer (to allow method chaining)
      • write

        public void write​(UnicodeString chars)
        Description copied from interface: UnicodeWriter
        Process a supplied string
        Specified by:
        write in interface UnicodeWriter
        Parameters:
        chars - the characters to be processed
      • writeAscii

        public void writeAscii​(byte[] content)
                        throws java.io.IOException
        Write a supplied string known to consist entirely of ASCII characters, supplied as a byte array
        Specified by:
        writeAscii in interface UnicodeWriter
        Parameters:
        content - byte array holding ASCII characters only
        Throws:
        java.io.IOException - if processing fails for any reason
      • write

        public void write​(java.lang.String chars)
                   throws java.io.IOException
        Process a supplied string
        Specified by:
        write in interface UnicodeWriter
        Parameters:
        chars - the characters to be processed
        Throws:
        java.io.IOException - if processing fails for any reason
      • trimToSize

        public void trimToSize()
      • close

        public void close()
        Complete the writing of characters to the result. The default implementation does nothing.
        Specified by:
        close in interface UnicodeWriter
        Specified by:
        close in interface UniStringConsumer