Package net.sf.saxon.str
package net.sf.saxon.str
This package contains classes used to handle Unicode strings: notably implementations of the
UnicodeString
interface, which represents a string as a sequence of directly-addressible
Unicode codepoints (without relying on surrogate pairs).
-
ClassDescriptionThis abstract implementation of UniStringConsumer exists largely for C#, as a place to capture the default methods defined in the interface, and avoid them proliferating into multiple subclassesAn implementation of
UnicodeString
that wraps a Java string which is known to contain no surrogates.Iterator over a string to produce a sequence of single character stringsThis class provides a compressed representation of a sequence of whitespace characters.A zero-length Unicode stringThis class provides a compressed representation of a string used to represent indentation: specifically, an integer number of newlines followed by an integer number of spaces.The segments (other than the last) have a fixed size of 65536 codepoints, which may use one byte per codepoint, two bytes per codepoint, or three bytes per codepoint, depending on the largest codepoint present in the segment.A Unicode string consisting entirely of 16-bit BMP characters, implemented as a range of an underlying byte arrayA Unicode string consisting of 24-bit characters, implemented as a range of an underlying byte array holding three bytes per codepointA Unicode string consisting entirely of 8-bit characters, implemented as a range of an underlying byte arrayContains constants representing some frequently used strings, either as aUnicodeString
or in some cases as a byte array.An implementation of the UnicodeString interface that wraps an ordinary Java string.Class to perform lowercase conversion.Class to perform uppercase conversion.Twine16
is a Unicode string consisting entirely of codepoints in the range 0-65535 (that is, the basic multilingual plane), excluding surrogates.Twine24
is Unicode string that accommodates any codepoint value up to 24 bits.Twine8
is Unicode string whose codepoints are all in the range 0-255 (that is, Latin-1).Interface that accepts a a sequence of Unicode codepoints.Builder class to construct a UnicodeString by appending text incrementallyA UnicodeString containing a single codepointA UnicodeString is a sequence of Unicode codepoints that supports codepoint addressing.Interface that accepts strings in the form ofUnicodeString
objects, which are written to some destination.Implementation ofUnicodeWriter
that converts Unicode strings to ordinary Java strings and sends them to a supplied WriterInterface that accepts a string in the form of a sequence of CharSequences, which are conceptually concatenated (though in some implementations, the final string may never be materialized in memory)This abstract class represents a couple of different implementations of strings containing whitespace only.A ZenoString is an implementation of UnicodeString that comprises a list of segments representing substrings of the total string.