Counting Lines, Paragraphs, or Records in a File using pc_split_paragraphs() in PHP

By: David Sklar Emailed: 1747 times Printed: 2434 times    

Latest comments
By: rohit kumar - how this program is work
By: Kirti - Hi..thx for the hadoop in
By: Spijker - I have altered the code a
By: ali mohammed - why we use the java in ne
By: ali mohammed - why we use the java in ne
By: mizhelle - when I exported the data
By: raul - no output as well, i'm ge
By: Rajesh - thanx very much...
By: Suindu De - Suppose we are executing

You want to count the number of lines, paragraphs, or records in a file.To count lines, use fgets(). Because it reads a line at a time, you can count the number of times it's called before reaching the end of a file:

$lines = 0;

if ($fh = fopen('orders.txt','r')) {
  while (! feof($fh)) {
    if (fgets($fh,1048576)) {
      $lines++;
    }
  }
}
print $lines;

To count paragraphs, increment the counter only when you read a blank line:

$paragraphs = 0;

if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
    $s = fgets($fh,1048576);
    if (("\n" == $s) || ("\r\n" == $s)) {
      $paragraphs++;
    }
  }
}
print $paragraphs;

To count records, increment the counter only when the line read contains just the record separator and whitespace:

$records = 0;
$record_separator = '--end--';

if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
    $s = rtrim(fgets($fh,1048576));
    if ($s == $record_separator) {
      $records++;
    }
  }
}
print $records;

In the line counter, $lines is incremented only if fgets( ) returns a true value. As fgets( ) moves through the file, it returns each line it retrieves. When it reaches the last line, it returns false, so $lines doesn't get incorrectly incremented. Because EOF has been reached on the file, feof( ) returns true, and the while loop ends.

This paragraph counter works fine on simple text but may produce unexpected results when presented with a long string of blank lines or a file without two consecutive linebreaks. These problems can be remedied with functions based on preg_split( ). If the file is small and can be read into memory, use the pc_split_paragraphs( ) function shown in example below. This function returns an array containing each paragraph in the file.

pc_split_paragraphs( )
function pc_split_paragraphs($file,$rs="\r?\n") {
    $text = join('',file($file));
    $matches = preg_split("/(.*?$rs)(?:$rs)+/s",$text,-1,
                          PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
    return $matches;
}

The contents of the file are broken on two or more consecutive newlines and returned in the $matches array. The default record-separation regular expression, \r?\n, matches both Windows and Unix linebreaks. If the file is too big to read into memory at once, use the pc_split_paragraphs_largefile( )function shown in example below, which reads the file in 4K chunks.

pc_split_paragraphs_largefile( )
function pc_split_paragraphs_largefile($file,$rs="\r?\n") {
    global $php_errormsg;

    $unmatched_text = '';
    $paragraphs = array();

    $fh = fopen($file,'r') or die($php_errormsg);

    while(! feof($fh)) {
        $s = fread($fh,4096) or die($php_errormsg);
        $text_to_split = $unmatched_text . $s;

        $matches = preg_split("/(.*?$rs)(?:$rs)+/s",$text_to_split,-1,
                              PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);

        // if the last chunk doesn't end with two record separators, save it
         * to prepend to the next section that gets read 
        $last_match = $matches[count($matches)-1];
        if (! preg_match("/$rs$rs\$/",$last_match)) {
            $unmatched_text = $last_match;
            array_pop($matches);
        } else {
            $unmatched_text = '';
        }
        
        $paragraphs = array_merge($paragraphs,$matches);
    }
    
    // after reading all sections, if there is a final chunk that doesn't
     * end with the record separator, count it as a paragraph 
    if ($unmatched_text) {
        $paragraphs[] = $unmatched_text;
    }
    return $paragraphs;
}

This function uses the same regular expression as pc_split_paragraphs( ) to split the file into paragraphs. When it finds a paragraph end in a chunk read from the file, it saves the rest of the text in the chunk in $unmatched_text and prepends it to the next chunk read. This includes the unmatched text as the beginning of the next paragraph in the file.


PHP Home | All PHP Tutorials | Latest PHP Tutorials

Sponsored Links

If this tutorial doesn't answer your question, or you have a specific question, just ask an expert here. Post your question to get a direct answer.



Bookmark and Share

Comments(1)


1. View Comment

could you please send me a complete appointment record code please!!!.
thank you so much..
jay


View Tutorial          By: Jaylon aron at 2010-02-21 02:22:57

Your name (required):


Your email(required, will not be shown to the public):


Your sites URL (optional):


Your comments:



More Tutorials by David Sklar
Find Difference between two dates in PHP
Reading .CSV file in PHP
Appending One Array to Another in PHP
Removing Duplicate Elements from an Array in PHP
Sorting an Array in PHP
Iterating Through an Array in PHP
Password protecting a page in PHP
Deleting Cookies in PHP
Reading Cookie Values in PHP
Setting cookies in PHP
Encrypting and decrypting in PHP
GDBM, NDBM, DB2, DB3, DBM, and CDB Databases in PHP
Using Text-File Databases in PHP
Upload and Download files with FTP in PHP
Extract files from a .zip file using PHP

More Tutorials in PHP
PHP code to import from CSV file to MySQL
PHP code to write to a CSV file from MySQL query
PHP code to write to a CSV file for Microsoft Applications
Convert XML to CSV in PHP
Password must include both numeric and alphabetic characters - Magento
PHP file upload (Large Files)
PHP file upload prompts authentication for anonymous users
PHP file upload with IIS on windows XP/2000 etc
Error: Length parameter must be greater than 0
Multiple File Upload in PHP using IFRAME
Resume or Pause File Uploads in PHP
Exception in module wampmanager.exe at 000F15A0 in Windows 8
Handling file locks in PHP
HTML table output using Nested for loops in PHP
Count occurrences of a character in a String in PHP

More Latest News
Most Viewed Articles (in PHP )
Parent: child process exited with status 3221225477 -- Restarting
isset() function in PHP
preg_split() and explode() in PHP
Convert a hex string into a 32-bit IEEE 754 float number in PHP
Different versions of PHP - History and evolution of PHP
A Basic Example using PHP in AWS (Amazon Web Services)
Building a Video Sharing Site using PHP in AWS
Handling BLOB in PHP and MySQL
Exception in module wampmanager.exe at 000F15A0 in Windows 8
Get the first and last day of the month in PHP
PHP file upload (Large Files)
History and origin of PHP
preg_match(), function preg_match_all(), preg_grep() in PHP
HTML table output using Nested for loops in PHP
Convert XML to CSV in PHP
Most Emailed Articles (in PHP)
Setting up PHP in Windows 2003 Server IIS7, and WinXP 64
Floating point precision in PHP
func_get_arg() and func_get_args() functions in PHP
Count occurrences of a character in a String in PHP
Different versions of PHP - History and evolution of PHP
The new keyword and constructors in PHP
Execution Lifetime of a PHP script
Counting Lines, Paragraphs, or Records in a File using pc_split_paragraphs() in PHP
GDBM, NDBM, DB2, DB3, DBM, and CDB Databases in PHP
A Basic Example using PHP in AWS (Amazon Web Services)
Building a Video Sharing Site using PHP in AWS
Function to return number of digits of an integer in PHP
Retrieve multiple rows from mysql and automatically create a table in PHP
Perl's Encoding::FixLatin equivalent in PHP
Get the next working day in PHP