Package net.sf.saxon.str
Class UnicodeBuilder
- java.lang.Object
-
- net.sf.saxon.str.UnicodeBuilder
-
- All Implemented Interfaces:
UnicodeWriter,UniStringConsumer
public final class UnicodeBuilder extends java.lang.Object implements UniStringConsumer, UnicodeWriter
Builder class to construct a UnicodeString by appending text incrementally
-
-
Constructor Summary
Constructors Constructor Description UnicodeBuilder()Create a Unicode builder with an initial allocation of 256 codepointsUnicodeBuilder(int allocate)Create a Unicode builder with an initial space allocation
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description UnicodeBuilderaccept(UnicodeString chars)Process a supplied stringUnicodeBuilderappend(char ch)Append a character, which must not be a surrogate.UnicodeBuilderappend(int codePoint)Append a single unicode character to the contentUnicodeBuilderappend(java.lang.CharSequence str)Append a Java CharSequence to the content.UnicodeBuilderappend(UnicodeString str)Append a UnicodeString object to the content.UnicodeBuilderappend(IntIterator codePoints)Append multiple unicode characters to the contentUnicodeBuilderappendAll(SequenceIterator iter)Append the string values of all the items in a sequence, with no separatorUnicodeBuilderappendLatin(java.lang.String str)Append a Java string to the content.voidclear()Reset the contents of this builder to be emptyvoidclose()Complete the writing of characters to the result.static byte[]expand(byte[] in, int start, int end, int oldWidth, int newWidth, int allocate)Expand the width of the characters in a byte arraystatic byte[]expand1to2(byte[] in, int start, int used, int allocate)Expand a byte array from 1-byte-per-character to 2-bytes-per-characterstatic byte[]expand1to3(byte[] in, int start, int used, int allocate)Expand a byte array from 1-byte-per-character to 3-bytes-per-characterstatic byte[]expand2to3(byte[] in, int start, int used, int allocate)Expand a byte array from 2-bytes-per-character to 3-bytes-per-characterstatic char[]expandBytesToChars(byte[] in, int start, int end)booleanisEmpty()Ask whether the content of the builder is emptylonglength()Get the number of codepoints currently in the builderjava.lang.StringtoString()Return a string containing the character content of this builderStringValuetoStringItem(AtomicType type)Construct a StringValue whose value is formed from the contents of this builderUnicodeStringtoUnicodeString()Construct a UnicodeString whose value is formed from the contents of this buildervoidtrimToSize()voidwrite(java.lang.String chars)Process a supplied stringvoidwrite(UnicodeString chars)Process a supplied stringvoidwriteAscii(byte[] content)Write a supplied string known to consist entirely of ASCII characters, supplied as a byte array-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface net.sf.saxon.str.UnicodeWriter
flush, writeCodePoint, writeRepeatedAscii
-
Methods inherited from interface net.sf.saxon.str.UniStringConsumer
open
-
-
-
-
Constructor Detail
-
UnicodeBuilder
public UnicodeBuilder()
Create a Unicode builder with an initial allocation of 256 codepoints
-
UnicodeBuilder
public UnicodeBuilder(int allocate)
Create a Unicode builder with an initial space allocation- Parameters:
allocate- the initial space allocation, in codepoints (32-bit integers)
-
-
Method Detail
-
append
public UnicodeBuilder append(char ch)
Append a character, which must not be a surrogate. (Method needed for C#, because implicit conversion of char to int isn't supported)- Parameters:
ch- the character- Returns:
- this builder, with the new character added
-
append
public UnicodeBuilder append(int codePoint)
Append a single unicode character to the content- Parameters:
codePoint- the unicode codepoint. The caller is responsible for ensuring that this is not a surrogate- Returns:
- this builder, with the new character added
-
append
public UnicodeBuilder append(IntIterator codePoints)
Append multiple unicode characters to the content- Parameters:
codePoints- an iterator delivering the codepoints to be added.- Returns:
- this builder, with the new characters added
-
appendLatin
public UnicodeBuilder appendLatin(java.lang.String str)
Append a Java string to the content. The caller is responsible for ensuring that this consists entirely of characters in the Latin-1 character set- Parameters:
str- the string to be appended- Returns:
- this builder, with the new string added
-
appendAll
public UnicodeBuilder appendAll(SequenceIterator iter)
Append the string values of all the items in a sequence, with no separator- Parameters:
iter- the sequence of items- Returns:
- this builder, with the new items added
-
append
public UnicodeBuilder append(java.lang.CharSequence str)
Append a Java CharSequence to the content. This may contain arbitrary characters including well formed surrogate pairs- Parameters:
str- the string to be appended- Returns:
- this builder, with the new string added
-
append
public UnicodeBuilder append(UnicodeString str)
Append a UnicodeString object to the content.- Parameters:
str- the string to be appended. The length is currently restricted to 2^31.- Returns:
- this builder, with the new string added
-
length
public long length()
Get the number of codepoints currently in the builder- Returns:
- the size in codepoints
-
isEmpty
public boolean isEmpty()
Ask whether the content of the builder is empty- Returns:
- true if the size is zero
-
toUnicodeString
public UnicodeString toUnicodeString()
Construct a UnicodeString whose value is formed from the contents of this builder- Returns:
- the constructed
UnicodeString
-
toStringItem
public StringValue toStringItem(AtomicType type)
Construct a StringValue whose value is formed from the contents of this builder- Parameters:
type- the required type, for example BuiltInAtomicType.STRING or BuiltInAtomicType.UNTYPED_ATOMIC. The caller warrants that the value is a valid instance of this type. No validation or whitespace normalization is carried out- Returns:
- the constructed StringValue
-
toString
public java.lang.String toString()
Return a string containing the character content of this builder- Overrides:
toStringin classjava.lang.Object- Returns:
- the character content of this builder as a Java String
-
clear
public void clear()
Reset the contents of this builder to be empty
-
expand1to2
public static byte[] expand1to2(byte[] in, int start, int used, int allocate)Expand a byte array from 1-byte-per-character to 2-bytes-per-character- Parameters:
in- the input byte arraystart- the start offset in bytesused- the end offset in bytesallocate- the number of code points to allow for in the output byte array- Returns:
- the new byte array
-
expandBytesToChars
public static char[] expandBytesToChars(byte[] in, int start, int end)
-
expand1to3
public static byte[] expand1to3(byte[] in, int start, int used, int allocate)Expand a byte array from 1-byte-per-character to 3-bytes-per-character- Parameters:
in- the input byte arraystart- the start offset in bytesused- the end offset in bytesallocate- the number of code points to allow for in the output byte array- Returns:
- the new byte array
-
expand2to3
public static byte[] expand2to3(byte[] in, int start, int used, int allocate)Expand a byte array from 2-bytes-per-character to 3-bytes-per-character- Parameters:
in- the input byte arraystart- the start offset in bytesused- the end offset in bytesallocate- the number of code points to allow for in the output byte array- Returns:
- the new byte array
-
expand
public static byte[] expand(byte[] in, int start, int end, int oldWidth, int newWidth, int allocate)Expand the width of the characters in a byte array- Parameters:
in- the input byte arraystart- the start offset in bytesend- the end offset in bytesoldWidth- the width of the characters (number of bytes per character) in the input arraynewWidth- the width of the characters (number of bytes per character) in the output array. If newWidth LE oldWidth then the input array is copied; the width is never reducedallocate- the number of code points to allow for in the output byte array; if zero (or insufficient) the output array will have no spare space for expansion- Returns:
- the new byte array
-
accept
public UnicodeBuilder accept(UnicodeString chars)
Process a supplied string- Specified by:
acceptin interfaceUniStringConsumer- Parameters:
chars- the characters to be processed- Returns:
- this CharSequenceConsumer (to allow method chaining)
-
write
public void write(UnicodeString chars)
Description copied from interface:UnicodeWriterProcess a supplied string- Specified by:
writein interfaceUnicodeWriter- Parameters:
chars- the characters to be processed
-
writeAscii
public void writeAscii(byte[] content) throws java.io.IOExceptionWrite a supplied string known to consist entirely of ASCII characters, supplied as a byte array- Specified by:
writeAsciiin interfaceUnicodeWriter- Parameters:
content- byte array holding ASCII characters only- Throws:
java.io.IOException- if processing fails for any reason
-
write
public void write(java.lang.String chars) throws java.io.IOExceptionProcess a supplied string- Specified by:
writein interfaceUnicodeWriter- Parameters:
chars- the characters to be processed- Throws:
java.io.IOException- if processing fails for any reason
-
trimToSize
public void trimToSize()
-
close
public void close()
Complete the writing of characters to the result. The default implementation does nothing.- Specified by:
closein interfaceUnicodeWriter- Specified by:
closein interfaceUniStringConsumer
-
-