Package net.sf.saxon.str
Class UnicodeBuilder
java.lang.Object
java.io.Writer
net.sf.saxon.str.UnicodeBuilder
- All Implemented Interfaces:
Closeable,Flushable,Appendable,AutoCloseable,UnicodeWriter,UniStringConsumer
Builder class to construct a UnicodeString by appending text incrementally
-
Field Summary
-
Constructor Summary
ConstructorsConstructorDescriptionCreate a Unicode builder with an initial allocation of 16 codepointsUnicodeBuilder(int allocate) Create a Unicode builder with an initial space allocation -
Method Summary
Modifier and TypeMethodDescriptionaccept(UnicodeString chars) Process a supplied stringappend(char ch) Append a character, which must not be a surrogate.append(int codePoint) Append a single unicode character to the contentappend(CharSequence str) Append a Java CharSequence to the content.append(UnicodeString str) Append a UnicodeString object to the content.append(IntIterator codePoints) Append multiple unicode characters to the contentappendLatin(String str) Append a Java string to the content.voidclear()Reset the contents of this builder to be emptyvoidclose()Complete the writing of characters to the result.static byte[]expand(byte[] in, int start, int end, int oldWidth, int newWidth, int allocate) Expand the width of the characters in a byte arraystatic byte[]expand1to2(byte[] in, int start, int used, int allocate) Expand a byte array from 1-byte-per-character to 2-bytes-per-characterstatic byte[]expand1to3(byte[] in, int start, int used, int allocate) Expand a byte array from 1-byte-per-character to 3-bytes-per-characterstatic byte[]expand2to3(byte[] in, int start, int used, int allocate) Expand a byte array from 2-bytes-per-character to 3-bytes-per-characterstatic char[]expandBytesToChars(byte[] in, int start, int end) voidflush()Flush the contents of any buffers.booleanisEmpty()Ask whether the content of the builder is emptylonglength()Get the number of codepoints currently in the buildertoString()Return a string containing the character content of this buildertoStringItem(AtomicType type) Construct a StringValue whose value is formed from the contents of this builderConstruct a UnicodeString whose value is formed from the contents of this buildervoidwrite(char[] cbuf, int off, int len) voidProcess a supplied stringvoidwrite(UnicodeString chars) Process a supplied stringvoidwriteAscii(byte[] content) Write a supplied string known to consist entirely of ASCII characters, supplied as a byte arrayvoidwriteAscii(int codepoint) Process a single ASCII character.voidwriteCodePoint(int codepoint) Process a single character.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface net.sf.saxon.str.UnicodeWriter
writeRepeatedAsciiMethods inherited from interface net.sf.saxon.str.UniStringConsumer
open
-
Constructor Details
-
UnicodeBuilder
public UnicodeBuilder()Create a Unicode builder with an initial allocation of 16 codepoints -
UnicodeBuilder
public UnicodeBuilder(int allocate) Create a Unicode builder with an initial space allocation- Parameters:
allocate- the initial space allocation, in codepoints (32-bit integers)
-
-
Method Details
-
append
Append a character, which must not be a surrogate. (Method needed for C#, because implicit conversion of char to int isn't supported)- Specified by:
appendin interfaceAppendable- Overrides:
appendin classWriter- Parameters:
ch- the character- Returns:
- this builder, with the new character added
-
append
Append a single unicode character to the content- Parameters:
codePoint- the unicode codepoint. The caller is responsible for ensuring that this is not a surrogate. (In fact, some callers, such as the JSON parser, do in fact append unpaired surrogates to the builder, and sort it out later.)- Returns:
- this builder, with the new character added
-
append
Append multiple unicode characters to the content- Parameters:
codePoints- an iterator delivering the codepoints to be added.- Returns:
- this builder, with the new characters added
-
appendLatin
Append a Java string to the content. The caller is responsible for ensuring that this consists entirely of characters in the Latin-1 character set- Parameters:
str- the string to be appended- Returns:
- this builder, with the new string added
-
append
Append a Java CharSequence to the content. This may contain arbitrary characters including well formed surrogate pairs- Specified by:
appendin interfaceAppendable- Overrides:
appendin classWriter- Parameters:
str- the string to be appended- Returns:
- this builder, with the new string added
-
append
Append a UnicodeString object to the content.- Parameters:
str- the string to be appended. The length is currently restricted to 2^31.- Returns:
- this builder, with the new string added
-
length
public long length()Get the number of codepoints currently in the builder- Returns:
- the size in codepoints
-
isEmpty
public boolean isEmpty()Ask whether the content of the builder is empty- Returns:
- true if the size is zero
-
toUnicodeString
Construct a UnicodeString whose value is formed from the contents of this builder- Returns:
- the constructed
UnicodeString
-
toStringItem
Construct a StringValue whose value is formed from the contents of this builder- Parameters:
type- the required type, for example BuiltInAtomicType.STRING or BuiltInAtomicType.UNTYPED_ATOMIC. The caller warrants that the value is a valid instance of this type. No validation or whitespace normalization is carried out- Returns:
- the constructed StringValue
-
toString
Return a string containing the character content of this builder -
clear
public void clear()Reset the contents of this builder to be empty -
expand1to2
public static byte[] expand1to2(byte[] in, int start, int used, int allocate) Expand a byte array from 1-byte-per-character to 2-bytes-per-character- Parameters:
in- the input byte arraystart- the start offset in bytesused- the end offset in bytesallocate- the number of code points to allow for in the output byte array- Returns:
- the new byte array
-
expandBytesToChars
public static char[] expandBytesToChars(byte[] in, int start, int end) -
expand1to3
public static byte[] expand1to3(byte[] in, int start, int used, int allocate) Expand a byte array from 1-byte-per-character to 3-bytes-per-character- Parameters:
in- the input byte arraystart- the start offset in bytesused- the end offset in bytesallocate- the number of code points to allow for in the output byte array- Returns:
- the new byte array
-
expand2to3
public static byte[] expand2to3(byte[] in, int start, int used, int allocate) Expand a byte array from 2-bytes-per-character to 3-bytes-per-character- Parameters:
in- the input byte arraystart- the start offset in bytesused- the end offset in bytesallocate- the number of code points to allow for in the output byte array- Returns:
- the new byte array
-
expand
public static byte[] expand(byte[] in, int start, int end, int oldWidth, int newWidth, int allocate) Expand the width of the characters in a byte array- Parameters:
in- the input byte arraystart- the start offset in bytesend- the end offset in bytesoldWidth- the width of the characters (number of bytes per character) in the input arraynewWidth- the width of the characters (number of bytes per character) in the output array. If newWidth LE oldWidth then the input array is copied; the width is never reducedallocate- the number of code points to allow for in the output byte array; if zero (or insufficient) the output array will have no spare space for expansion- Returns:
- the new byte array
-
accept
Process a supplied string- Specified by:
acceptin interfaceUniStringConsumer- Parameters:
chars- the characters to be processed- Returns:
- this CharSequenceConsumer (to allow method chaining)
-
write
Description copied from interface:UnicodeWriterProcess a supplied string- Specified by:
writein interfaceUnicodeWriter- Parameters:
chars- the characters to be processed
-
writeAscii
Write a supplied string known to consist entirely of ASCII characters, supplied as a byte array- Specified by:
writeAsciiin interfaceUnicodeWriter- Parameters:
content- byte array holding ASCII characters only- Throws:
IOException- if processing fails for any reason
-
writeCodePoint
Process a single character.- Specified by:
writeCodePointin interfaceUnicodeWriter- Parameters:
codepoint- the Unicode character to be processed. Must not be a surrogate- Throws:
IOException- if processing fails for any reason
-
writeAscii
public void writeAscii(int codepoint) Process a single ASCII character.- Specified by:
writeAsciiin interfaceUnicodeWriter- Parameters:
codepoint- the Unicode character to be processed. Must be in the range 0-127; this is not necessarily checked- Throws:
IOException- if processing fails for any reason
-
write
Process a supplied string- Specified by:
writein interfaceUnicodeWriter- Overrides:
writein classWriter- Parameters:
chars- the characters to be processed- Throws:
IOException- if processing fails for any reason
-
write
- Specified by:
writein classWriter- Throws:
IOException
-
flush
Description copied from interface:UnicodeWriterFlush the contents of any buffers. The default implementation does nothing.- Specified by:
flushin interfaceFlushable- Specified by:
flushin interfaceUnicodeWriter- Specified by:
flushin classWriter- Throws:
IOException- if processing fails for any reason
-
close
public void close()Complete the writing of characters to the result. The default implementation does nothing.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Specified by:
closein interfaceUnicodeWriter- Specified by:
closein interfaceUniStringConsumer- Specified by:
closein classWriter
-