public abstract class UnicodeString extends java.lang.Object implements AtomicMatchKey, java.lang.Comparable<UnicodeString>
The interface is future-proofed to support code points in the range 0 to 2^31, and string lengths of up to 2^63 characters. Implementations may (and do) impose lower limits.
Constructor and Description |
---|
UnicodeString() |
Modifier and Type | Method and Description |
---|---|
AtomicValue |
asAtomic()
Get an atomic value that encapsulates this match key.
|
protected void |
checkSubstringBounds(long start,
long end) |
abstract int |
codePointAt(long index)
Get the code point at a given position in the string
|
abstract IntIterator |
codePoints()
Get an iterator over the code points present in the string.
|
int |
compareTo(UnicodeString other)
Compare this string to another using codepoint comparison
|
UnicodeString |
concat(UnicodeString other)
Concatenate with another string, returning a new string
|
UnicodeString |
economize() |
boolean |
equals(java.lang.Object obj) |
long |
estimatedLength()
Get the estimated length of the string, suitable for space allocation.
|
abstract int |
getWidth()
Get the number of bits needed to hold all the characters in this string
|
int |
hashCode()
Compute a hashCode.
|
boolean |
hasSubstring(UnicodeString other,
long offset)
Ask whether this string has another string as its content starting at a given offset
|
long |
indexOf(int codePoint)
Get the position of the first occurrence of the specified codepoint,
starting the search at the beginning
|
abstract long |
indexOf(int codePoint,
long from)
Get the position of the first occurrence of the specified codepoint,
starting the search at a given position in the string
|
long |
indexOf(UnicodeString other,
long from)
Get the first position, at or beyond
from , where another string appears as a substring
of this string, comparing codepoints. |
long |
indexWhere(java.util.function.IntPredicate predicate,
long from)
Get the position of the first occurrence of a codepoint that matches a supplied predicate,
starting the search at a given position in the string
|
boolean |
isEmpty()
Ask whether the string is empty
|
abstract long |
length()
Get the length of the string
|
int |
length32()
Get the length of the string, provided it is less than 2^31 characters
|
UnicodeString |
prefix(long end)
Get a substring of this string, starting at position 0, with a given end position
|
static int |
requireInt(long value)
Utility method for use where strings longer than 2^31 characters cannot yet be handled.
|
UnicodeString |
substring(long start)
Get a substring of this codepoint sequence, with a given start position,
finishing at the end of the string
|
abstract UnicodeString |
substring(long start,
long end)
Get a substring of this string, with a given start and end position
|
UnicodeString |
tidy()
Ensure that the implementation is capable of counting codepoints in the string.
|
void |
verifyCharacters()
Diagnostic method: verify that all the characters in the string are valid XML codepoints
|
public UnicodeString tidy()
UnicodeString
, or another that represents the same sequence
of characters.public UnicodeString economize()
public abstract long length()
public int length32()
int
java.lang.UnsupportedOperationException
- if the string is longer than 2^31 characterspublic long estimatedLength()
UnicodeString
, the actual length of the string in codepointspublic boolean isEmpty()
public abstract int getWidth()
public long indexOf(int codePoint)
codePoint
- the sought codePointjava.lang.UnsupportedOperationException
- if the UnicodeString
has not been prepared
for codePoint accesspublic abstract long indexOf(int codePoint, long from)
codePoint
- the sought codePointfrom
- the position from which the search should start (0-based)java.lang.UnsupportedOperationException
- if the UnicodeString
has not been prepared
for codePoint accesspublic long indexWhere(java.util.function.IntPredicate predicate, long from)
predicate
- condition that the codepoint must satisfyfrom
- the position from which the search should start (0-based)java.lang.UnsupportedOperationException
- if the UnicodeString
has not been prepared
for codePoint accesspublic long indexOf(UnicodeString other, long from)
from
, where another string appears as a substring
of this string, comparing codepoints.other
- the other (sought) stringfrom
- the position (0-based) where searching is to start (counting in codepoints)public boolean hasSubstring(UnicodeString other, long offset)
other
- the other stringoffset
- the starting position in this string (counting in codepoints)public abstract IntIterator codePoints()
public abstract int codePointAt(long index)
index
- the given position (0-based)java.lang.IndexOutOfBoundsException
- if the index is out of rangepublic UnicodeString substring(long start)
start
- the start position (0-based): that is, the position of the first
code point to be includedjava.lang.IndexOutOfBoundsException
- if the start position is out of rangepublic abstract UnicodeString substring(long start, long end)
start
- the start position (0-based): that is, the position of the first
code point to be includedend
- the end position (0-based): specifically, the position of the first
code point not to be includedjava.lang.IndexOutOfBoundsException
- if the start/end positions are out of range (the conditions
are the same as for String.substring()
)public UnicodeString prefix(long end)
end
- the end position (0-based): specifically, the position of the first
code point not to be includedjava.lang.IndexOutOfBoundsException
- if the end position is out of rangepublic UnicodeString concat(UnicodeString other)
other
- the string to be appendedprotected void checkSubstringBounds(long start, long end)
public void verifyCharacters()
java.lang.IllegalStateException
- if the contents are invalidpublic boolean equals(java.lang.Object obj)
equals
in class java.lang.Object
public int hashCode()
UnicodeString
use compatible hash codes and the
hashing algorithm is therefore identical to that for java.lang.String
. This means
that for strings containing Astral characters, the hash code needs to be computed by decomposing
an Astral character into a surrogate pair.hashCode
in class java.lang.Object
public int compareTo(UnicodeString other)
compareTo
in interface java.lang.Comparable<UnicodeString>
other
- the other stringpublic AtomicValue asAtomic()
asAtomic
in interface AtomicMatchKey
public static int requireInt(long value)
value
- the actual value of a character position within a string, or the length of
a stringjava.lang.UnsupportedOperationException
- if the supplied value exceeds Integer.MAX_VALUE
Copyright (c) 2004-2022 Saxonica Limited. All rights reserved.