A token stream is an instance of the StreamTokenizer class. It represents a sequence of programmer-defined tokens constructed from a reader. A token stream, x, provides methods that allow programmers to specify which ranges of characters are white space (i.e., should be ignored), which ranges of characters are ordinary (i.e., should be regarded as part of a token), and if numbers are tokens. The x.nextToken() method uses this information to extract the next number or ordinary character from the token stream, skipping over white space characters. ASCII codes of ordinary characters are stored in an integer variable called x.ttype. If the value of this variable is StreamTokenizer.TT_Number, then a number was read and stored in the variable x.nval.
As an example of token streams, let's define a recursive descent parser for a simple, expression-oriented language called Alpha. Alpha expressions are sums of terms. Terms are products of factors. A factor is a number or parenthetical expression. Here's an EBNF grammar for Alpha:
EXPRESSION ::= TERM | TERM + EXPRESSION
TERM ::= FACTOR | FACTOR * TERM
FACTOR ::= NUMBER | (EXPRESSION)
We begin by extending the StreamTokenizer class:
class AlphaTokenStream extends StreamTokenizer {
public AlphaTokenStream(Reader r) {
super(r);
resetSyntax();
// ignore all chars but digits, CR, LF, +, and *:
whitespaceChars(0, 9);
whitespaceChars(14, 39);
whitespaceChars(44, 47);
whitespaceChars(58, 127);
ordinaryChars(lParen, sumOp);
parseNumbers();
eolIsSignificant(true);
}
public void getToken() {
try {
do nextToken(); while (atEOL());
}
catch(IOException e) {
Tools.exit("getToken: " + e);
}
} // getToken
boolean atEOL() {
return (ttype == StreamTokenizer.TT_EOL);
}
public static char sumChar = '+';
public static int sumOp = (int) sumChar;
public static char mulChar = '*';
public static int mulOp = (int) mulChar;
public static char lParenChar = '(';
public static int lParen = (int) lParenChar;
public static char rParenChar = ')';
public static int rParen = (int) rParenChar;
public static int CR = 13;
public static int LF = 10;
}
The Alpha control loop creates an AlphaTokenStream object, tokens, from Tools.cin, then perpetually displays a prompt, constructs an expression object from the token stream, then displays the parse tree and value of the expression:
public class Alpha {
public static void main(String[] args) {
AlphaTokenStream tokens =
new AlphaTokenStream(Tools.cin);
while(true) {
Tools.cout.print("Alpha> ");
Tools.cout.flush();
tokens.getToken();
Expression e = new Expression(tokens);
Tools.cout.println("parse tree = " + e);
Tools.cout.println("value = " + e.eval());
}
} // main
} // Alpha
Alpha expressions, terms, and factors are instances of the Expression, Term, and Factor classes respectively. Each class provides an eval() method for evaluating the represented phrase, a toString() method for converting the represented phrase into a "parse tree", and a constructor that initializes the represented phrase from tokens extracted from an AlphaTokenStream instance.
The Expression constructor begins by constructing a term from the token stream. If tokens.ttype contains the '+' character, then a subsequent expression is recursively constructed from the token stream:
class Expression {
public Expression(AlphaTokenStream tokens) {
term = new Term(tokens);
if (tokens.ttype == tokens.sumOp) {
tokens.getToken();
exp = new Expression(tokens);
oneChild = false;
}
else
oneChild = true;
}
public String toString() {
if (oneChild)
return new String("" + term);
else
return new String("(+ " + term + " " + exp + ")");
}
public double eval() {
if (oneChild)
return term.eval();
else
return term.eval() + exp.eval();
}
private boolean oneChild;
private Term term;
private Expression exp; // undefined if oneChild = true
}
The Term constructor begins by constructing a factor from the token stream. If tokens.ttype contains a '*' character, then a subsequent term is recursively constructed from the token stream:
class Term{
public Term(AlphaTokenStream tokens) {
factor = new Factor(tokens);
if (tokens.ttype == tokens.mulOp) {
tokens.getToken();
term = new Term(tokens);
oneChild = false;
}
else oneChild = true;
}
public String toString() {
if(oneChild)
return new String("" + factor);
else
return new String("(* " + factor + " " + term + ")");
}
public double eval() {
if (oneChild)
return factor.eval();
else
return factor.eval() * term.eval();
}
private boolean oneChild;
private Factor factor;
private Term term; // undefined if oneChild = true
}
If tokens.ttype contains a left parenthesis, then the factor constructor constructs an expression from the token stream and checks to make sure a right parenthesis follows. Otherwise, the factor constructor copies the number in tokens.nval into its num variable:
class Factor{
public Factor(AlphaTokenStream tokens) {
if (tokens.ttype == tokens.lParen) {
tokens.getToken();
exp = new Expression(tokens);
num = 0;
isNumber = false;
if (tokens.ttype == tokens.rParen) {
tokens.getToken();
}
else {
Tools.error("invalid expression");
}
}
else if (tokens.ttype == AlphaTokenStream.TT_NUMBER) {
num = tokens.nval;
isNumber = true;
tokens.getToken();
}
}
public String toString() {
if(isNumber)
return new String("" + num);
else
return new String("" + exp);
}
public double eval() {
if (isNumber)
return num;
else
return exp.eval();
}
private boolean isNumber;
private double num; // 0 if isNumber = false
private Expression exp; // undefined if isNumber = true
}