Reading word by word from a file in PHP

By: David Sklar Emailed: 1590 times Printed: 2040 times    

Latest comments
By: rohit kumar - how this program is work
By: Kirti - Hi..thx for the hadoop in
By: Spijker - I have altered the code a
By: ali mohammed - why we use the java in ne
By: ali mohammed - why we use the java in ne
By: mizhelle - when I exported the data
By: raul - no output as well, i'm ge
By: Rajesh - thanx very much...
By: Suindu De - Suppose we are executing

You want to do something with every word in a file. Read in each line with fgets(), separate the line into words, and process each word:

$fh = fopen('great-american-novel.txt','r') or die($php_errormsg);
while (! feof($fh)) {
    if ($s = fgets($fh,1048576)) {
        $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY);
        // process words
    }
}
fclose($fh) or die($php_errormsg);

Here's how to calculate average word length in a file:

$word_count = $word_length = 0;

if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
    if ($s = fgets($fh,1048576)) {
      $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY);
      foreach ($words as $word) {
        $word_count++;
        $word_length += strlen($word);
      }
    }
  }
}

print sprintf("The average word length over %d words is %.02f characters.",
              $word_count,
              $word_length/$word_count);

Processing every word proceeds differently depending on how "word" is defined. The code in this recipe uses the Perl-compatible regular-expression engine's \s whitespace metacharacter, which includes space, tab, newline, carriage return, and formfeed. Code sample above breaks apart a line into words by splitting on a space, which is useful in that recipe because the words have to be rejoined with spaces. The Perl-compatible engine also has a word-boundary assertion (\b) that matches between a word character (alphanumeric) and a nonword character (anything else). Using \b instead of \s to delimit words most noticeably treats differently words with embedded punctuation. The term 6 o'clock is two words when split by whitespace (6 and o'clock); it's four words when split by word boundaries (6, o, ', and clock).


PHP Home | All PHP Tutorials | Latest PHP Tutorials

Sponsored Links

If this tutorial doesn't answer your question, or you have a specific question, just ask an expert here. Post your question to get a direct answer.



Bookmark and Share

Comments(1)


1. View Comment

Hi
Thankyou for your code. Its good.


View Tutorial          By: Rajeshkumar at 2010-01-29 22:34:17

Your name (required):


Your email(required, will not be shown to the public):


Your sites URL (optional):


Your comments:



More Tutorials by David Sklar
Find Difference between two dates in PHP
Reading .CSV file in PHP
Appending One Array to Another in PHP
Removing Duplicate Elements from an Array in PHP
Sorting an Array in PHP
Iterating Through an Array in PHP
Password protecting a page in PHP
Deleting Cookies in PHP
Reading Cookie Values in PHP
Setting cookies in PHP
Encrypting and decrypting in PHP
GDBM, NDBM, DB2, DB3, DBM, and CDB Databases in PHP
Using Text-File Databases in PHP
Upload and Download files with FTP in PHP
Extract files from a .zip file using PHP

More Tutorials in PHP
PHP code to import from CSV file to MySQL
PHP code to write to a CSV file from MySQL query
PHP code to write to a CSV file for Microsoft Applications
Convert XML to CSV in PHP
Password must include both numeric and alphabetic characters - Magento
PHP file upload (Large Files)
PHP file upload prompts authentication for anonymous users
PHP file upload with IIS on windows XP/2000 etc
Error: Length parameter must be greater than 0
Multiple File Upload in PHP using IFRAME
Resume or Pause File Uploads in PHP
Exception in module wampmanager.exe at 000F15A0 in Windows 8
Handling file locks in PHP
HTML table output using Nested for loops in PHP
Count occurrences of a character in a String in PHP

More Latest News
Most Viewed Articles (in PHP )
A Basic Example using PHP in AWS (Amazon Web Services)
isset() function in PHP
Upload and Download files with FTP in PHP
Parent: child process exited with status 3221225477 -- Restarting
Different versions of PHP - History and evolution of PHP
preg_split() and explode() in PHP
Reading word by word from a file in PHP
GDBM, NDBM, DB2, DB3, DBM, and CDB Databases in PHP
Sorting an Array in PHP
Reading .CSV file in PHP
Using Text file as database in PHP
Generate random timestamp between two dates
Function to return number of digits of an integer in PHP
Count occurrences of a character in a String in PHP
Exception in module wampmanager.exe at 000F15A0 in Windows 8
Most Emailed Articles (in PHP)
Integers and Floating-Point Numbers in PHP
Appending One Array to Another in PHP
Reading .CSV file in PHP
Perl's Encoding::FixLatin equivalent in PHP
Handling BLOB in PHP and MySQL
File Handling in PHP
Reading word by word from a file in PHP
Upload and Download files with FTP in PHP
History and origin of PHP
Variables in PHP
Strings in PHP
Using list() in PHP
switch Statements in PHP
do...while Loops in PHP
Destructors in PHP