Processing every word proceeds differently depending on how "word" is defined. The code in this recipe uses the Perl-compatible regular-expression engine's \s whitespace metacharacter, which includes space, tab, newline, carriage return, and formfeed. Code sample above breaks apart a line into words by splitting on a space, which is useful in that recipe because the words have to be rejoined with spaces. The Perl-compatible engine also has a word-boundary assertion (\b) that matches between a word character (alphanumeric) and a nonword character (anything else). Using \b instead of \s to delimit words most noticeably treats differently words with embedded punctuation. The term 6 o'clock is two words when split by whitespace (6 and o'clock); it's four words when split by word boundaries (6, o, ', and clock).
COPYRIGHTę 2006 java-samples.com. ALL RIGHTS RESERVED.