Class REMatcher
REProgram, which is constructed using the RECompiler.
Although the regular expression engine was originally based on Apache Jakarta, the run-time evaluator has been completely re-written for Saxon.
The design is essentially following the interpreter pattern. The compiled
regular expression is represented as a tree of Operation objects,
each of which has an evaluation method Operation.iterateMatches(REMatcher, int).
This takes as input the current position in the input string, and returns an iterator
over the possible positions at which a match using this operation can end.
The REMatcher is stateful
-
Nested Class Summary
Nested Classes -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected final voidcapture(int which, int start, int end, boolean inLookahead) Sets a captured groupprotected voidclearCapturedGroupsBeyond(int pos) Clear any captured groups whose start position is at or beyond some specified positiongetParen(int which) Gets the contents of a parenthesized subexpression after a successful match.intReturns the number of parenthesized subexpressions available after a successful match.final intgetParenEnd(int which) Returns the end index of a given paren level.final intgetParenStart(int which) Returns the start index of a given paren level.Returns the current regular expression program in use by this matcher object.booleanisAnchoredMatch(UnicodeString search) Tests whether the regex matches a string in its entirety, anchored at both endsbooleanMatches the current regular expression program against a String.booleanmatch(UnicodeString search, int i) Matches the current regular expression program against a character array, starting at a given index.protected booleanmatchAt(int i, boolean anchored) Match the current regular expression program against the current input string, starting at index i of the input string.replace(UnicodeString in, UnicodeString replacement) Substitutes a string for this regular expression in another string.replaceWith(UnicodeString in, BiFunction<UnicodeString, UnicodeString[], UnicodeString> replacer) Substitutes a string for this regular expression in another string.voidSplits a string into an array of strings on regular expression boundaries.
-
Constructor Details
-
REMatcher
Construct a matcher for a pre-compiled regular expression from program (bytecode) data.- Parameters:
program- Compiled regular expression program- See Also:
-
-
Method Details
-
getProgram
Returns the current regular expression program in use by this matcher object.- Returns:
- Regular expression program
-
getParenCount
public int getParenCount()Returns the number of parenthesized subexpressions available after a successful match.- Returns:
- Number of available parenthesized subexpressions
-
getParen
Gets the contents of a parenthesized subexpression after a successful match.- Parameters:
which- Nesting level of subexpression- Returns:
- String
-
getParenStart
public final int getParenStart(int which) Returns the start index of a given paren level.- Parameters:
which- Nesting level of subexpression- Returns:
- String index
-
getParenEnd
public final int getParenEnd(int which) Returns the end index of a given paren level.- Parameters:
which- Nesting level of subexpression- Returns:
- String index
-
capture
protected final void capture(int which, int start, int end, boolean inLookahead) Sets a captured group- Parameters:
which- Which paren levelstart- start index in input stringend- end index in input stringinLookahead- true if the match is within a lookahead assertion
-
clearCapturedGroupsBeyond
protected void clearCapturedGroupsBeyond(int pos) Clear any captured groups whose start position is at or beyond some specified position- Parameters:
pos- the specified position
-
matchAt
protected boolean matchAt(int i, boolean anchored) Match the current regular expression program against the current input string, starting at index i of the input string. This method is only meant for internal use.- Parameters:
i- The input string index to start matching atanchored- true if the regex must match all characters up to the end of the string- Returns:
- True if the input matched the expression
-
isAnchoredMatch
Tests whether the regex matches a string in its entirety, anchored at both ends- Parameters:
search- the string to be matched- Returns:
- true if the regex matches the whole string
-
match
Matches the current regular expression program against a character array, starting at a given index.- Parameters:
search- String to match againsti- Index to start searching at- Returns:
- True if string matched
-
match
Matches the current regular expression program against a String.- Parameters:
search- String to match against- Returns:
- True if string matched
-
split
Splits a string into an array of strings on regular expression boundaries. This function works the same way as the Perl function of the same name. Given a regular expression of "[ab]+" and a string to split of "xyzzyababbayyzabbbab123", the result would be the array of Strings "[xyzzy, yyz, 123]".Please note that the first string in the resulting array may be an empty string. This happens when the very first character of input string is matched by the pattern.
- Parameters:
s- String to split on this regular exression- Returns:
- A list of strings
-
replace
Substitutes a string for this regular expression in another string. This method works like the Perl function of the same name. Given a regular expression of "a*b", a String to substituteIn of "aaaabfooaaabgarplyaaabwackyb" and the substitution String "-", the resulting String returned by subst would be "-foo-garply-wacky-".It is also possible to reference the contents of a parenthesized expression with $0, $1, ... $9. A regular expression of "http://[\\.\\w\\-\\?/~_@&=%]+", a String to substituteIn of "visit us: http://www.apache.org!" and the substitution String "<a href=\"$0\">$0</a>", the resulting String returned by subst would be "visit us: <a href=\"http://www.apache.org\">http://www.apache.org</a>!".
Note: $0 represents the whole match.
- Parameters:
in- String to substitute withinreplacement- String to substitute for matches of this regular expression- Returns:
- The string substituteIn with zero or more occurrences of the current regular expression replaced with the substitution String (if this regular expression object doesn't match at any position, the original String is returned unchanged).
-
replaceWith
public UnicodeString replaceWith(UnicodeString in, BiFunction<UnicodeString, UnicodeString[], UnicodeString> replacer) Substitutes a string for this regular expression in another string. This method works like the Perl function of the same name. Given a regular expression of "a*b", a String to substituteIn of "aaaabfooaaabgarplyaaabwackyb" and the substitution String "-", the resulting String returned by subst would be "-foo-garply-wacky-".It is also possible to reference the contents of a parenthesized expression with $0, $1, ... $9. A regular expression of "http://[\\.\\w\\-\\?/~_@&=%]+", a String to substituteIn of "visit us: http://www.apache.org!" and the substitution String "<a href=\"$0\">$0</a>", the resulting String returned by subst would be "visit us: <a href=\"http://www.apache.org\">http://www.apache.org</a>!".
Note: $0 represents the whole match.
- Parameters:
in- String to substitute withinreplacer- Function to process each matching substring and return a replacement- Returns:
- The string substituteIn with zero or more occurrences of the current regular expression replaced with the substitution String (if this regular expression object doesn't match at any position, the original String is returned unchanged).
-
captureState
-
resetState
-