java.lang.Object
- net.sf.saxon.str.UnicodeString

All Implemented Interfaces:

java.lang.Comparable<UnicodeString>, AtomicMatchKey

Direct Known Subclasses:

BMPString, EmptyUnicodeString, Slice16, Slice24, Slice8, StringView, Twine16, Twine24, Twine8, UnicodeChar, WhitespaceString, ZenoString
```
public abstract class UnicodeString
extends java.lang.Object
implements AtomicMatchKey, java.lang.Comparable<UnicodeString>
```
A UnicodeString is a sequence of Unicode codepoints that supports codepoint addressing.
The interface is future-proofed to support code points in the range 0 to 2^31, and string lengths of up to 2^63 characters. Implementations may (and do) impose lower limits.

Constructor Summary

Constructors
Constructor Description

UnicodeString()

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method	Description
`AtomicValue`	`asAtomic()`	Get an atomic value that encapsulates this match key.
`protected void`	`checkSubstringBounds(long start, long end)`
`abstract int`	`codePointAt(long index)`	Get the code point at a given position in the string
`abstract IntIterator`	`codePoints()`	Get an iterator over the code points present in the string.
`int`	`compareTo(UnicodeString other)`	Compare this string to another using codepoint comparison
`UnicodeString`	`concat(UnicodeString other)`	Concatenate with another string, returning a new string
`UnicodeString`	`economize()`
`boolean`	`equals(java.lang.Object obj)`
`long`	`estimatedLength()`	Get the estimated length of the string, suitable for space allocation.
`abstract int`	`getWidth()`	Get the number of bits needed to hold all the characters in this string
`int`	`hashCode()`	Compute a hashCode.
`boolean`	`hasSubstring(UnicodeString other, long offset)`	Ask whether this string has another string as its content starting at a given offset
`long`	`indexOf(int codePoint)`	Get the position of the first occurrence of the specified codepoint, starting the search at the beginning
`abstract long`	`indexOf(int codePoint, long from)`	Get the position of the first occurrence of the specified codepoint, starting the search at a given position in the string
`long`	`indexOf(UnicodeString other, long from)`	Get the first position, at or beyond `from`, where another string appears as a substring of this string, comparing codepoints.
`long`	`indexWhere(java.util.function.IntPredicate predicate, long from)`	Get the position of the first occurrence of a codepoint that matches a supplied predicate, starting the search at a given position in the string
`boolean`	`isEmpty()`	Ask whether the string is empty
`abstract long`	`length()`	Get the length of the string
`int`	`length32()`	Get the length of the string, provided it is less than 2^31 characters
`UnicodeString`	`prefix(long end)`	Get a substring of this string, starting at position 0, with a given end position
`static int`	`requireInt(long value)`	Utility method for use where strings longer than 2^31 characters cannot yet be handled.
`UnicodeString`	`substring(long start)`	Get a substring of this codepoint sequence, with a given start position, finishing at the end of the string
`abstract UnicodeString`	`substring(long start, long end)`	Get a substring of this string, with a given start and end position
`UnicodeString`	`tidy()`	Ensure that the implementation is capable of counting codepoints in the string.
`void`	`verifyCharacters()`	Diagnostic method: verify that all the characters in the string are valid XML codepoints

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - UnicodeString
```
public UnicodeString()
```
- Method Detail
  - tidy
```
public UnicodeString tidy()
```
    Ensure that the implementation is capable of counting codepoints in the string. This is normally a null operation, but it may cause internal reorganisation.
    
    Returns:
    
    this UnicodeString, or another that represents the same sequence of characters.
  - economize
```
public UnicodeString economize()
```
  - length
```
public abstract long length()
```
    Get the length of the string
    
    Returns:
    
    the number of code points in the string
  - length32
```
public int length32()
```
    Get the length of the string, provided it is less than 2^31 characters
    
    Returns:
    
    the length of the string if it fits within a Java int
    
    Throws:
    
    java.lang.UnsupportedOperationException - if the string is longer than 2^31 characters
  - estimatedLength
```
public long estimatedLength()
```
    Get the estimated length of the string, suitable for space allocation.
    
    Returns:
    
    for a UnicodeString, the actual length of the string in codepoints
  - isEmpty
```
public boolean isEmpty()
```
    Ask whether the string is empty
    
    Returns:
    
    true if the length of the string is zero
  - getWidth
```
public abstract int getWidth()
```
    Get the number of bits needed to hold all the characters in this string
    
    Returns:
    
    7 for ascii characters (not used??), 8 for latin-1, 16 for BMP, 24 for general Unicode.
  - indexOf
```
public long indexOf(int codePoint)
```
    Get the position of the first occurrence of the specified codepoint, starting the search at the beginning
    
    Parameters:
    
    codePoint - the sought codePoint
    
    Returns:
    
    the position (0-based) of the first occurrence found, or -1 if not found, counting codePoints rather than UTF16 chars.
    
    Throws:
    
    java.lang.UnsupportedOperationException - if the UnicodeString has not been prepared for codePoint access
  - indexOf
```
public abstract long indexOf(int codePoint,
                             long from)
```
    Get the position of the first occurrence of the specified codepoint, starting the search at a given position in the string
    
    Parameters:
    
    codePoint - the sought codePoint
    
    from - the position from which the search should start (0-based)
    
    Returns:
    
    the position (0-based) of the first occurrence found, or -1 if not found
    
    Throws:
    
    java.lang.UnsupportedOperationException - if the UnicodeString has not been prepared for codePoint access
  - indexWhere
```
public long indexWhere(java.util.function.IntPredicate predicate,
                       long from)
```
    Get the position of the first occurrence of a codepoint that matches a supplied predicate, starting the search at a given position in the string
    
    Parameters:
    
    predicate - condition that the codepoint must satisfy
    
    from - the position from which the search should start (0-based)
    
    Returns:
    
    the position (0-based) of the first codepoint to match the predicate, or -1 if not found
    
    Throws:
    
    java.lang.UnsupportedOperationException - if the UnicodeString has not been prepared for codePoint access
  - indexOf
```
public long indexOf(UnicodeString other,
                    long from)
```
    Get the first position, at or beyond from, where another string appears as a substring of this string, comparing codepoints.
    
    Parameters:
    
    other - the other (sought) string
    
    from - the position (0-based) where searching is to start (counting in codepoints)
    
    Returns:
    
    the first position where the substring is found, or -1 if it is not found
  - hasSubstring
```
public boolean hasSubstring(UnicodeString other,
                            long offset)
```
    Ask whether this string has another string as its content starting at a given offset
    
    Parameters:
    
    other - the other string
    
    offset - the starting position in this string (counting in codepoints)
    
    Returns:
    
    true if the other string appears as a substring of this string starting at the given position.
  - codePoints
```
public abstract IntIterator codePoints()
```
    Get an iterator over the code points present in the string.
    
    Returns:
    
    an iterator that delivers the individual code points
  - codePointAt
```
public abstract int codePointAt(long index)
```
    Get the code point at a given position in the string
    
    Parameters:
    
    index - the given position (0-based)
    
    Returns:
    
    the code point at the given position
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the index is out of range
  - substring
```
public UnicodeString substring(long start)
```
    Get a substring of this codepoint sequence, with a given start position, finishing at the end of the string
    
    Parameters:
    
    start - the start position (0-based): that is, the position of the first code point to be included
    
    Returns:
    
    the requested substring
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the start position is out of range
  - substring
```
public abstract UnicodeString substring(long start,
                                        long end)
```
    Get a substring of this string, with a given start and end position
    
    Parameters:
    
    start - the start position (0-based): that is, the position of the first code point to be included
    
    end - the end position (0-based): specifically, the position of the first code point not to be included
    
    Returns:
    
    the requested substring
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the start/end positions are out of range (the conditions are the same as for String.substring())
  - prefix
```
public UnicodeString prefix(long end)
```
    Get a substring of this string, starting at position 0, with a given end position
    
    Parameters:
    
    end - the end position (0-based): specifically, the position of the first code point not to be included
    
    Returns:
    
    the requested substring
    
    Throws:
    
    java.lang.IndexOutOfBoundsException - if the end position is out of range
  - concat
```
public UnicodeString concat(UnicodeString other)
```
    Concatenate with another string, returning a new string
    
    Parameters:
    
    other - the string to be appended
    
    Returns:
    
    the result of concatenating this string followed by the other
  - checkSubstringBounds
```
protected void checkSubstringBounds(long start,
                                    long end)
```
  - verifyCharacters
```
public void verifyCharacters()
```
    Diagnostic method: verify that all the characters in the string are valid XML codepoints
    
    Throws:
    
    java.lang.IllegalStateException - if the contents are invalid
  - equals
```
public boolean equals(java.lang.Object obj)
```
    Overrides:
    
    equals in class java.lang.Object
  - hashCode
```
public int hashCode()
```
    Compute a hashCode. All implementations of UnicodeString use compatible hash codes and the hashing algorithm is therefore identical to that for java.lang.String. This means that for strings containing Astral characters, the hash code needs to be computed by decomposing an Astral character into a surrogate pair.
    
    Overrides:
    
    hashCode in class java.lang.Object
    
    Returns:
    
    the hash code
  - compareTo
```
public int compareTo(UnicodeString other)
```
    Compare this string to another using codepoint comparison
    
    Specified by:
    
    compareTo in interface java.lang.Comparable<UnicodeString>
    
    Parameters:
    
    other - the other string
    
    Returns:
    
    -1 if this string comes first, 0 if they are equal, +1 if the other string comes first
  - asAtomic
```
public AtomicValue asAtomic()
```
    Get an atomic value that encapsulates this match key. Needed to support the collation-key() function.
    
    Specified by:
    
    asAtomic in interface AtomicMatchKey
    
    Returns:
    
    an atomic value that encapsulates this match key
  - requireInt
```
public static int requireInt(long value)
```
    Utility method for use where strings longer than 2^31 characters cannot yet be handled.
    
    Parameters:
    
    value - the actual value of a character position within a string, or the length of a string
    
    Returns:
    
    the value as an integer if it is within range
    
    Throws:
    
    java.lang.UnsupportedOperationException - if the supplied value exceeds Integer.MAX_VALUE

Class UnicodeString

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

UnicodeString

Method Detail

tidy

economize

length

length32

estimatedLength

isEmpty

getWidth

indexOf

indexOf

indexWhere

indexOf

hasSubstring

codePoints

codePointAt

substring

substring

prefix

concat

checkSubstringBounds

verifyCharacters

equals

hashCode

compareTo

asAtomic

requireInt