net.sf.saxon.expr
Class Tokenizer

java.lang.Object
  extended by net.sf.saxon.expr.Tokenizer

public final class Tokenizer
extends Object

Tokenizer for expressions and inputs. This code was originally derived from James Clark's xt, though it has been greatly modified since. See copyright notice at end of file.


Field Summary
static int BARE_NAME_STATE
          State in which a name is NOT to be merged with what comes next, for example "("
 int currentToken
          The number identifying the most recently read token
 int currentTokenStartOffset
          The position in the input expression where the current token starts
 String currentTokenValue
          The string value of the most recently read token
static int DEFAULT_STATE
          Initial default state of the Tokenizer
 String input
          The string being parsed
 int inputOffset
          The current position within the input string
static int OPERATOR_STATE
          State in which the next thing to be read is an operator
static int SEQUENCE_TYPE_STATE
          State in which the next thing to be read is a SequenceType
 int startLineNumber
          The starting line number (for XPath in XSLT, the line number in the stylesheet)
 
Constructor Summary
Tokenizer()
           
 
Method Summary
 int getColumnNumber()
          Get the column number of the current token
 int getColumnNumber(int offset)
          Return the column number corresponding to a given offset in the expression
 long getLineAndColumn(int offset)
          Get the line and column number corresponding to a given offset in the input expression, as a long value with the line number in the top half and the column number in the lower half
 int getLineNumber()
          Get the line number of the current token
 int getLineNumber(int offset)
          Return the line number corresponding to a given offset in the expression
 int getState()
          Get the current tokenizer state
 void incrementLineNumber(int offset)
          Increment the line number, making a record of where in the input string the newline character occurred.
 void lookAhead()
          Look ahead by one token.
 void next()
          Get the next token from the input expression.
 char nextChar()
          Read next character directly.
 String recentText(int offset)
          Get the most recently read text (for use in an error message)
 void setState(int state)
          Set the tokenizer into a special state
 void tokenize(String input, int start, int end, int lineNumber)
          Prepare a string for tokenization.
 void treatCurrentAsOperator()
          Force the current token to be treated as an operator if possible
 void unreadChar()
          Step back one character.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DEFAULT_STATE

public static final int DEFAULT_STATE
Initial default state of the Tokenizer

See Also:
Constant Field Values

BARE_NAME_STATE

public static final int BARE_NAME_STATE
State in which a name is NOT to be merged with what comes next, for example "("

See Also:
Constant Field Values

SEQUENCE_TYPE_STATE

public static final int SEQUENCE_TYPE_STATE
State in which the next thing to be read is a SequenceType

See Also:
Constant Field Values

OPERATOR_STATE

public static final int OPERATOR_STATE
State in which the next thing to be read is an operator

See Also:
Constant Field Values

startLineNumber

public int startLineNumber
The starting line number (for XPath in XSLT, the line number in the stylesheet)


currentToken

public int currentToken
The number identifying the most recently read token


currentTokenValue

public String currentTokenValue
The string value of the most recently read token


currentTokenStartOffset

public int currentTokenStartOffset
The position in the input expression where the current token starts


input

public String input
The string being parsed


inputOffset

public int inputOffset
The current position within the input string

Constructor Detail

Tokenizer

public Tokenizer()
Method Detail

getState

public int getState()
Get the current tokenizer state

Returns:
the current state

setState

public void setState(int state)
Set the tokenizer into a special state

Parameters:
state - the new state

tokenize

public void tokenize(String input,
                     int start,
                     int end,
                     int lineNumber)
              throws XPathException
Prepare a string for tokenization. The actual tokens are obtained by calls on next()

Parameters:
input - the string to be tokenized
start - start point within the string
end - end point within the string (last character not read): -1 means end of string
lineNumber - the linenumber in the source where the expression appears
Throws:
XPathException - if a lexical error occurs, e.g. unmatched string quotes

next

public void next()
          throws XPathException
Get the next token from the input expression. The type of token is returned in the currentToken variable, the string value of the token in currentTokenValue.

Throws:
XPathException - if a lexical error is detected

treatCurrentAsOperator

public void treatCurrentAsOperator()
Force the current token to be treated as an operator if possible


lookAhead

public void lookAhead()
               throws XPathException
Look ahead by one token. This method does the real tokenization work. The method is normally called internally, but the XQuery parser also calls it to resume normal tokenization after dealing with pseudo-XML syntax.

Throws:
XPathException - if a lexical error occurs

nextChar

public char nextChar()
              throws StringIndexOutOfBoundsException
Read next character directly. Used by the XQuery parser when parsing pseudo-XML syntax

Returns:
the next character from the input
Throws:
StringIndexOutOfBoundsException - if an attempt is made to read beyond the end of the string. This will only occur in the event of a syntax error in the input.

incrementLineNumber

public void incrementLineNumber(int offset)
Increment the line number, making a record of where in the input string the newline character occurred.

Parameters:
offset - the place in the input string where the newline occurred

unreadChar

public void unreadChar()
Step back one character. If this steps back to a previous line, adjust the line number.


recentText

public String recentText(int offset)
Get the most recently read text (for use in an error message)

Parameters:
offset - the offset of the offending token, if known, or -1 to use the current offset
Returns:
a chunk of text leading up to the error

getLineNumber

public int getLineNumber()
Get the line number of the current token

Returns:
the line number

getColumnNumber

public int getColumnNumber()
Get the column number of the current token

Returns:
the column number

getLineAndColumn

public long getLineAndColumn(int offset)
Get the line and column number corresponding to a given offset in the input expression, as a long value with the line number in the top half and the column number in the lower half

Parameters:
offset - the byte offset in the expression
Returns:
the line and column number, packed together

getLineNumber

public int getLineNumber(int offset)
Return the line number corresponding to a given offset in the expression

Parameters:
offset - the byte offset in the expression
Returns:
the line number

getColumnNumber

public int getColumnNumber(int offset)
Return the column number corresponding to a given offset in the expression

Parameters:
offset - the byte offset in the expression
Returns:
the column number


Copyright (c) 2004-2010 Saxonica Limited. All rights reserved.