StreamTokenizer sample program in Java

By: Emiley J Viewed: 153349 times  Printer Friendly Format    

StreamTokenizer defines several methods. In this example, we will use only a few. To reset the default set of delimiters, we will employ the resetSyntax( ) method. The default set of delimiters is finely tuned for tokenizing Java programs and is thus too specialized for this example. We declare that our tokens, or "words," are any consecutive string of visible characters delimited on both sides by whitespace.

We use the eolIsSignificant( ) method to ensure that newline characters will be delivered as tokens, so we can count the number of lines as well as words. It has this general form:

void eolIsSignificant(boolean eolFlag)

If eolFlag is true, the end-of-line characters are returned as tokens. If eolFlag is false, the end-of-line characters are ignored.

The wordChars( ) method is used to specify the range of characters that can be used in words. Its general form is shown here:

void wordChars(int start, int end)

Here, start and end specify the range of valid characters. In the program, characters in the range 33 to 255 are valid word characters. The whitespace characters are specified using whitespaceChars( ). It has this general form:

void whitespaceChars(int start, int end)

Here, start and end specify the range of valid whitespace characters. The next token is obtained from the input stream by calling nextToken( ). It returns the type of the token.

StreamTokenizer defines four int constants: TT_EOF, TT_EOL, TT_NUMBER, and TT_WORD. There are three instance variables. nval is a public double used to hold the values of numbers as they are recognized. sval is a public String used to hold the value of any words as they are recognized. ttype is a public int indicating the type of token that has just been read by the nextToken( ) method. If the token is a word, ttype equals TT_WORD. If the token is a number, ttype equals TT_NUMBER. If the token is a single character, ttype contains its value. If an end-of-line condition has been encountered, ttype equals TT_EOL. (This assumes that eolIsSignificant( ) was invoked with a true argument.) If the end of the stream has been encountered, ttype equals TT_EOF. The word count program revised to use a StreamTokenizer is shown here:

// Enhanced word count program that uses a StreamTokenizer
class WordCount {
public static int words=0;
public static int lines=0;
public static int chars=0;
public static void wc(Reader r) throws IOException {
StreamTokenizer tok = new StreamTokenizer(r);
tok.wordChars(33, 255);
tok.whitespaceChars(0, ' ');
while (tok.nextToken() != tok.TT_EOF) {
switch (tok.ttype) {
case tok.TT_EOL:
case tok.TT_WORD:
default: // FALLSTHROUGH
chars += tok.sval.length();
public static void main(String args[]) {
if (args.length == 0) { // We're working with stdin
try {
wc(new InputStreamReader(;
System.out.println(lines + " " + words + " " + chars);
} catch (IOException e) {};
} else { // We're working with a list of files
int twords = 0, tchars = 0, tlines = 0;
for (int i=0; i<args.length; i++) {
try {
words = chars = lines = 0;
wc(new FileReader(args[i]));
twords += words;
tchars += chars;
tlines += lines;
System.out.println(args[i] + ": " +
lines + " " + words + " " + chars);
} catch (IOException e) {
System.out.println(args[i] + ": error.");
System.out.println("total: " +
tlines + " " + twords + " " + tchars);

This tutorial is an extract from the "The Complete Reference Part 2 by Herbert Schildt".

Most Viewed Articles (in Java )

Latest Articles (in Java)

Comment on this tutorial