Reading word by word from a file in PHP

By: David Sklar  

You want to do something with every word in a file. Read in each line with fgets(), separate the line into words, and process each word:

$fh = fopen('great-american-novel.txt','r') or die($php_errormsg);
while (! feof($fh)) {
    if ($s = fgets($fh,1048576)) {
        $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY);
        // process words
    }
}
fclose($fh) or die($php_errormsg);

Here's how to calculate average word length in a file:

$word_count = $word_length = 0;

if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
    if ($s = fgets($fh,1048576)) {
      $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY);
      foreach ($words as $word) {
        $word_count++;
        $word_length += strlen($word);
      }
    }
  }
}

print sprintf("The average word length over %d words is %.02f characters.",
              $word_count,
              $word_length/$word_count);

Processing every word proceeds differently depending on how "word" is defined. The code in this recipe uses the Perl-compatible regular-expression engine's \s whitespace metacharacter, which includes space, tab, newline, carriage return, and formfeed. Code sample above breaks apart a line into words by splitting on a space, which is useful in that recipe because the words have to be rejoined with spaces. The Perl-compatible engine also has a word-boundary assertion (\b) that matches between a word character (alphanumeric) and a nonword character (anything else). Using \b instead of \s to delimit words most noticeably treats differently words with embedded punctuation. The term 6 o'clock is two words when split by whitespace (6 and o'clock); it's four words when split by word boundaries (6, o, ', and clock).





Most Viewed Articles (in PHP )

Anonymous functions as properties in Classes in PHP

Function to sort array by elements and count of element in PHP

Counting Lines, Paragraphs, or Records in a File using pc_split_paragraphs() in PHP

Reading .ini files in PHP

Password protecting a page in PHP

Warning: session_start(): open .... failed - PHP error

Locking files in PHP

Installing PHP 5.x with Apache 2.x on HP UX 11i and configuring PHP 5.x with Oracle 9i

error: "Service Unavailable" after installing PHP to a Windows XP x64 Pro

Cannot load /usr/local/apache/libexec/libphp4.so into server: ld.so.1:......

Running different websites on different versions of PHP in Windows 2003 & IIS6 platform

PHP 5.1.4 INSTALLATION on Solaris 9 (Sparc)

Building PHP 5.x with Apache2 on SuSE Professional 9.1/9.2

Function to convert strings to strict booleans in PHP

Malware: global $ob_starting;

Latest Articles (in PHP)

Comment on this tutorial