Class Tokenizer

java.lang.Object
net.sf.saxon.expr.parser.Tokenizer

public final class Tokenizer extends Object
Tokenizer for expressions and inputs.

This code was originally derived from James Clark's xt, though it has been greatly modified since. See copyright notice at end of file.

  • Field Summary Link icon

    Fields
    Modifier and Type
    Field
    Description
    boolean
    Flag to allow Saxon extensions
    static final int
    State in which a name is NOT to be merged with what comes next, for example "("
    int
    The number identifying the most recently read token
    int
    The position in the input expression where the current token starts
    The string value of the most recently read token
    static final int
    Initial default state of the Tokenizer
    boolean
    Flag to disallow "union" as a synonym for "|" when parsing XSLT 2.0 patterns
    static final char
     
    static final char
     
    The string being parsed
    int
    The current position within the input string
    boolean
    Flag to indicate that this is XQuery as distinct from XPath
    int
    XPath language level: e.g.
    static final char
     
    static final int
    State in which the next thing to be read is an operator
    static final int
    State in which the next thing to be read is a SequenceType
  • Constructor Summary Link icon

    Constructors
    Constructor
    Description
     
  • Method Summary Link icon

    Modifier and Type
    Method
    Description
    void
    Checkpoint the state of this tokenizer so that unbounded lookahead is possible (or, restore the state of the tokenizer from a checkpoint)
    int
    Get the column number of the current token
    int
    getColumnNumber(int offset)
    Return the column number corresponding to a given offset in the expression
    int
    Get the line number of the current token
    int
    getLineNumber(int offset)
    Return the line number corresponding to a given offset in the expression
    int
    Get the current tokenizer state
    void
    incrementLineNumber(int offset)
    Increment the line number, making a record of where in the input string the newline character occurred.
    void
    Look ahead by one token.
    void
    Get the next token from the input expression.
    char
    Read next character directly.
    char
    Look ahead to see what the next character will be, without changing the current state
    void
    setState(int state)
    Set the tokenizer into a special state
    boolean
    Return true if there is a thin arrow ("->") somewhere beyond the current position.
    void
    tokenize(String input, int start, int end)
    Prepare a string for tokenization.
    void
    Force the current token to be treated as an operator if possible
    void
    Step back one character.

    Methods inherited from class java.lang.Object Link icon

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details Link icon

    • FULL_WIDTH_LT Link icon

      public static final char FULL_WIDTH_LT
      See Also:
    • FULL_WIDTH_GT Link icon

      public static final char FULL_WIDTH_GT
      See Also:
    • NUL Link icon

      public static final char NUL
      See Also:
    • DEFAULT_STATE Link icon

      public static final int DEFAULT_STATE
      Initial default state of the Tokenizer
      See Also:
    • BARE_NAME_STATE Link icon

      public static final int BARE_NAME_STATE
      State in which a name is NOT to be merged with what comes next, for example "("
      See Also:
    • SEQUENCE_TYPE_STATE Link icon

      public static final int SEQUENCE_TYPE_STATE
      State in which the next thing to be read is a SequenceType
      See Also:
    • OPERATOR_STATE Link icon

      public static final int OPERATOR_STATE
      State in which the next thing to be read is an operator
      See Also:
    • currentToken Link icon

      public int currentToken
      The number identifying the most recently read token
    • currentTokenValue Link icon

      public String currentTokenValue
      The string value of the most recently read token
    • currentTokenStartOffset Link icon

      public int currentTokenStartOffset
      The position in the input expression where the current token starts
    • input Link icon

      public String input
      The string being parsed
    • inputOffset Link icon

      public int inputOffset
      The current position within the input string
    • disallowUnionKeyword Link icon

      public boolean disallowUnionKeyword
      Flag to disallow "union" as a synonym for "|" when parsing XSLT 2.0 patterns
    • isXQuery Link icon

      public boolean isXQuery
      Flag to indicate that this is XQuery as distinct from XPath
    • languageLevel Link icon

      public int languageLevel
      XPath language level: e.g. 2.0, 3.0, or 3.1
    • allowSaxonExtensions Link icon

      public boolean allowSaxonExtensions
      Flag to allow Saxon extensions
  • Constructor Details Link icon

    • Tokenizer Link icon

      public Tokenizer()
  • Method Details Link icon

    • getState Link icon

      public int getState()
      Get the current tokenizer state
      Returns:
      the current state
    • setState Link icon

      public void setState(int state)
      Set the tokenizer into a special state
      Parameters:
      state - the new state
    • tokenize Link icon

      public void tokenize(String input, int start, int end) throws XPathException
      Prepare a string for tokenization. The actual tokens are obtained by calls on next()
      Parameters:
      input - the string to be tokenized
      start - start point within the string
      end - end point within the string (last character not read): -1 means end of string
      Throws:
      XPathException - if a lexical error occurs, e.g. unmatched string quotes
    • next Link icon

      public void next() throws XPathException
      Get the next token from the input expression. The type of token is returned in the currentToken variable, the string value of the token in currentTokenValue.
      Throws:
      XPathException - if a lexical error is detected
    • thereMightBeAnArrowAhead Link icon

      public boolean thereMightBeAnArrowAhead()
      Return true if there is a thin arrow ("->") somewhere beyond the current position. This can be used to eliminate unnecessary lookahead
      Returns:
      true if a thin arrow is present. Of course, this might be a false positive.
    • treatCurrentAsOperator Link icon

      public void treatCurrentAsOperator()
      Force the current token to be treated as an operator if possible
    • lookAhead Link icon

      public void lookAhead() throws XPathException
      Look ahead by one token. This method does the real tokenization work. The method is normally called internally, but the XQuery parser also calls it to resume normal tokenization after dealing with pseudo-XML syntax.
      Throws:
      XPathException - if a lexical error occurs
    • nextChar Link icon

      public char nextChar()
      Read next character directly. Used by the XQuery parser when parsing pseudo-XML syntax
      Returns:
      the next character from the input, or NUL at the end of the input
    • peekChar Link icon

      public char peekChar()
      Look ahead to see what the next character will be, without changing the current state
      Returns:
      the next character, or NUL at the end of the input.
    • incrementLineNumber Link icon

      public void incrementLineNumber(int offset)
      Increment the line number, making a record of where in the input string the newline character occurred.
      Parameters:
      offset - the place in the input string where the newline occurred
    • unreadChar Link icon

      public void unreadChar()
      Step back one character. If this steps back to a previous line, adjust the line number. If we have already read off the end of the input, do nothing.
    • copyTo Link icon

      public void copyTo(Tokenizer u)
      Checkpoint the state of this tokenizer so that unbounded lookahead is possible (or, restore the state of the tokenizer from a checkpoint)
      Parameters:
      u - When checkpointing, a Tokenizer used simply to hold the state so that it can be restored later. This tokenizer is not capable of active tokenizing because many of its variables are uninitialised. When restoring from a checkpoint, the original tokenizer whose state is to be restored.
    • getLineNumber Link icon

      public int getLineNumber()
      Get the line number of the current token
      Returns:
      the line number. Line numbers reported by the tokenizer start at zero.
    • getColumnNumber Link icon

      public int getColumnNumber()
      Get the column number of the current token
      Returns:
      the column number. Column numbers reported by the tokenizer start at zero.
    • getLineNumber Link icon

      public int getLineNumber(int offset)
      Return the line number corresponding to a given offset in the expression
      Parameters:
      offset - the byte offset in the expression
      Returns:
      the line number. Line and column numbers reported by the tokenizer start at zero.
    • getColumnNumber Link icon

      public int getColumnNumber(int offset)
      Return the column number corresponding to a given offset in the expression
      Parameters:
      offset - the byte offset in the expression
      Returns:
      the column number. Line and column numbers reported by the tokenizer start at zero.