Package net.sf.saxon.str


package net.sf.saxon.str

This package contains classes used to handle Unicode strings: notably implementations of the UnicodeString interface, which represents a string as a sequence of directly-addressible Unicode codepoints (without relying on surrogate pairs).

  • Class
    Description
    This abstract implementation of UniStringConsumer exists largely for C#, as a place to capture the default methods defined in the interface, and avoid them proliferating into multiple subclasses
    An implementation of UnicodeString that wraps a Java string which is known to contain no surrogates.
    Iterator over a string to produce a sequence of single character strings
    This class provides a compressed representation of a sequence of whitespace characters.
    A zero-length Unicode string
    This class provides a compressed representation of a string used to represent indentation: specifically, an integer number of newlines followed by an integer number of spaces.
    The segments (other than the last) have a fixed size of 65536 codepoints, which may use one byte per codepoint, two bytes per codepoint, or three bytes per codepoint, depending on the largest codepoint present in the segment.
    A Unicode string consisting entirely of 16-bit BMP characters, implemented as a range of an underlying byte array
    A Unicode string consisting of 24-bit characters, implemented as a range of an underlying byte array holding three bytes per codepoint
    A Unicode string consisting entirely of 8-bit characters, implemented as a range of an underlying byte array
    Contains constants representing some frequently used strings, either as a UnicodeString or in some cases as a byte array.
     
    An implementation of the UnicodeString interface that wraps an ordinary Java string.
    Class to perform lowercase conversion.
    Class to perform uppercase conversion.
    Twine16 is a Unicode string consisting entirely of codepoints in the range 0-65535 (that is, the basic multilingual plane), excluding surrogates.
    Twine24 is Unicode string that accommodates any codepoint value up to 24 bits.
    Twine8 is Unicode string whose codepoints are all in the range 0-255 (that is, Latin-1).
    Interface that accepts a a sequence of Unicode codepoints.
    Builder class to construct a UnicodeString by appending text incrementally
    A UnicodeString containing a single codepoint
    A UnicodeString is a sequence of Unicode codepoints that supports codepoint addressing.
    Interface that accepts strings in the form of UnicodeString objects, which are written to some destination.
    Implementation of UnicodeWriter that converts Unicode strings to ordinary Java strings and sends them to a supplied Writer
    Interface that accepts a string in the form of a sequence of CharSequences, which are conceptually concatenated (though in some implementations, the final string may never be materialized in memory)
    This abstract class represents a couple of different implementations of strings containing whitespace only.
    A ZenoString is an implementation of UnicodeString that comprises a list of segments representing substrings of the total string.