java.lang.Object

net.sf.saxon.expr.parser.Tokenizer

public final class Tokenizer extends Object

Tokenizer for expressions and inputs.

This code was originally derived from James Clark's xt, though it has been greatly modified since. See copyright notice at end of file.

Field Summary

Fields

Modifier and Type

Field

Description

boolean

allowSaxonExtensions

Flag to allow Saxon extensions

static final int

BARE_NAME_STATE

State in which a name is NOT to be merged with what comes next, for example "("

int

currentToken

The number identifying the most recently read token

int

currentTokenStartOffset

The position in the input expression where the current token starts

String

currentTokenValue

The string value of the most recently read token

static final int

DEFAULT_STATE

Initial default state of the Tokenizer

boolean

disallowUnionKeyword

Flag to disallow "union" as a synonym for "|" when parsing XSLT 2.0 patterns

static final char

FULL_WIDTH_GT

static final char

FULL_WIDTH_LT

String

input

The string being parsed

int

inputOffset

The current position within the input string

boolean

isXQuery

Flag to indicate that this is XQuery as distinct from XPath

int

languageLevel

XPath language level: e.g.

static final char

NUL

static final int

OPERATOR_STATE

State in which the next thing to be read is an operator

static final int

SEQUENCE_TYPE_STATE

State in which the next thing to be read is a SequenceType
Constructor Summary

Constructors

Constructor

Description

Tokenizer()
Method Summary

Modifier and Type

Method

Description

void

copyTo(Tokenizer u)

Checkpoint the state of this tokenizer so that unbounded lookahead is possible (or, restore the state of the tokenizer from a checkpoint)

int

getColumnNumber()

Get the column number of the current token

int

getColumnNumber(int offset)

Return the column number corresponding to a given offset in the expression

int

getLineNumber()

Get the line number of the current token

int

getLineNumber(int offset)

Return the line number corresponding to a given offset in the expression

int

getState()

Get the current tokenizer state

void

incrementLineNumber(int offset)

Increment the line number, making a record of where in the input string the newline character occurred.

void

lookAhead()

Look ahead by one token.

void

next()

Get the next token from the input expression.

char

nextChar()

Read next character directly.

char

peekChar()

Look ahead to see what the next character will be, without changing the current state

void

setState(int state)

Set the tokenizer into a special state

boolean

thereMightBeAnArrowAhead()

Return true if there is a thin arrow ("->") somewhere beyond the current position.

void

tokenize(String input, int start, int end)

Prepare a string for tokenization.

void

treatCurrentAsOperator()

Force the current token to be treated as an operator if possible

void

unreadChar()

Step back one character.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- FULL_WIDTH_LT
  
  public static final char FULL_WIDTH_LT
  See Also:
  
  Constant Field Values
- FULL_WIDTH_GT
  
  public static final char FULL_WIDTH_GT
  See Also:
  
  Constant Field Values
- NUL
  
  public static final char NUL
  See Also:
  
  Constant Field Values
- DEFAULT_STATE
  
  public static final int DEFAULT_STATE
  
  Initial default state of the Tokenizer
  See Also:
  
  Constant Field Values
- BARE_NAME_STATE
  
  public static final int BARE_NAME_STATE
  
  State in which a name is NOT to be merged with what comes next, for example "("
  See Also:
  
  Constant Field Values
- SEQUENCE_TYPE_STATE
  
  public static final int SEQUENCE_TYPE_STATE
  
  State in which the next thing to be read is a SequenceType
  See Also:
  
  Constant Field Values
- OPERATOR_STATE
  
  public static final int OPERATOR_STATE
  
  State in which the next thing to be read is an operator
  See Also:
  
  Constant Field Values
- currentToken
  
  public int currentToken
  
  The number identifying the most recently read token
- currentTokenValue
  
  public String currentTokenValue
  
  The string value of the most recently read token
- currentTokenStartOffset
  
  public int currentTokenStartOffset
  
  The position in the input expression where the current token starts
- input
  
  public String input
  
  The string being parsed
- inputOffset
  
  public int inputOffset
  
  The current position within the input string
- disallowUnionKeyword
  
  public boolean disallowUnionKeyword
  
  Flag to disallow "union" as a synonym for "|" when parsing XSLT 2.0 patterns
- isXQuery
  
  public boolean isXQuery
  
  Flag to indicate that this is XQuery as distinct from XPath
- languageLevel
  
  public int languageLevel
  
  XPath language level: e.g. 2.0, 3.0, or 3.1
- allowSaxonExtensions
  
  public boolean allowSaxonExtensions
  
  Flag to allow Saxon extensions
Constructor Details
- Tokenizer
  
  public Tokenizer()
Method Details
- getState
  
  public int getState()
  
  Get the current tokenizer state
  
  Returns:
  
  the current state
- setState
  
  public void setState(int state)
  
  Set the tokenizer into a special state
  
  Parameters:
  
  state - the new state
- tokenize
  
  public void tokenize(String input, int start, int end) throws XPathException
  
  Prepare a string for tokenization. The actual tokens are obtained by calls on next()
  
  Parameters:
  
  input - the string to be tokenized
  
  start - start point within the string
  
  end - end point within the string (last character not read): -1 means end of string
  
  Throws:
  
  XPathException - if a lexical error occurs, e.g. unmatched string quotes
- next
  
  public void next() throws XPathException
  
  Get the next token from the input expression. The type of token is returned in the currentToken variable, the string value of the token in currentTokenValue.
  
  Throws:
  
  XPathException - if a lexical error is detected
- thereMightBeAnArrowAhead
  
  public boolean thereMightBeAnArrowAhead()
  
  Return true if there is a thin arrow ("->") somewhere beyond the current position. This can be used to eliminate unnecessary lookahead
  
  Returns:
  
  true if a thin arrow is present. Of course, this might be a false positive.
- treatCurrentAsOperator
  
  public void treatCurrentAsOperator()
  
  Force the current token to be treated as an operator if possible
- lookAhead
  
  public void lookAhead() throws XPathException
  
  Look ahead by one token. This method does the real tokenization work. The method is normally called internally, but the XQuery parser also calls it to resume normal tokenization after dealing with pseudo-XML syntax.
  
  Throws:
  
  XPathException - if a lexical error occurs
- nextChar
  
  public char nextChar()
  
  Read next character directly. Used by the XQuery parser when parsing pseudo-XML syntax
  
  Returns:
  
  the next character from the input, or NUL at the end of the input
- peekChar
  
  public char peekChar()
  
  Look ahead to see what the next character will be, without changing the current state
  
  Returns:
  
  the next character, or NUL at the end of the input.
- incrementLineNumber
  
  public void incrementLineNumber(int offset)
  
  Increment the line number, making a record of where in the input string the newline character occurred.
  
  Parameters:
  
  offset - the place in the input string where the newline occurred
- unreadChar
  
  public void unreadChar()
  
  Step back one character. If this steps back to a previous line, adjust the line number. If we have already read off the end of the input, do nothing.
- copyTo
  
  public void copyTo(Tokenizer u)
  
  Checkpoint the state of this tokenizer so that unbounded lookahead is possible (or, restore the state of the tokenizer from a checkpoint)
  
  Parameters:
  
  u - When checkpointing, a Tokenizer used simply to hold the state so that it can be restored later. This tokenizer is not capable of active tokenizing because many of its variables are uninitialised. When restoring from a checkpoint, the original tokenizer whose state is to be restored.
- getLineNumber
  
  public int getLineNumber()
  
  Get the line number of the current token
  
  Returns:
  
  the line number. Line numbers reported by the tokenizer start at zero.
- getColumnNumber
  
  public int getColumnNumber()
  
  Get the column number of the current token
  
  Returns:
  
  the column number. Column numbers reported by the tokenizer start at zero.
- getLineNumber
  
  public int getLineNumber(int offset)
  
  Return the line number corresponding to a given offset in the expression
  
  Parameters:
  
  offset - the byte offset in the expression
  
  Returns:
  
  the line number. Line and column numbers reported by the tokenizer start at zero.
- getColumnNumber
  
  public int getColumnNumber(int offset)
  
  Return the column number corresponding to a given offset in the expression
  
  Parameters:
  
  offset - the byte offset in the expression
  
  Returns:
  
  the column number. Line and column numbers reported by the tokenizer start at zero.

Class Tokenizer

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

FULL_WIDTH_LT

FULL_WIDTH_GT

NUL

DEFAULT_STATE

BARE_NAME_STATE

SEQUENCE_TYPE_STATE

OPERATOR_STATE

currentToken

currentTokenValue

currentTokenStartOffset

input

inputOffset

disallowUnionKeyword

isXQuery

languageLevel

allowSaxonExtensions

Constructor Details

Tokenizer

Method Details

getState

setState

tokenize

next

thereMightBeAnArrowAhead

treatCurrentAsOperator

lookAhead

nextChar

peekChar

incrementLineNumber

unreadChar

copyTo

getLineNumber

getColumnNumber

getLineNumber

getColumnNumber