Programming Tutorials

Counting Lines, Paragraphs, or Records in a File using pc_split_paragraphs() in PHP

By: David Sklar in PHP Tutorials on 2008-12-01  

You want to count the number of lines, paragraphs, or records in a file.To count lines, use fgets(). Because it reads a line at a time, you can count the number of times it's called before reaching the end of a file:

$lines = 0;

if ($fh = fopen('orders.txt','r')) {
  while (! feof($fh)) {
    if (fgets($fh,1048576)) {
      $lines++;
    }
  }
}
print $lines;

To count paragraphs, increment the counter only when you read a blank line:

$paragraphs = 0;

if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
    $s = fgets($fh,1048576);
    if (("\n" == $s) || ("\r\n" == $s)) {
      $paragraphs++;
    }
  }
}
print $paragraphs;

To count records, increment the counter only when the line read contains just the record separator and whitespace:

$records = 0;
$record_separator = '--end--';

if ($fh = fopen('great-american-novel.txt','r')) {
  while (! feof($fh)) {
    $s = rtrim(fgets($fh,1048576));
    if ($s == $record_separator) {
      $records++;
    }
  }
}
print $records;

In the line counter, $lines is incremented only if fgets( ) returns a true value. As fgets() moves through the file, it returns each line it retrieves. When it reaches the last line, it returns false, so $lines doesn't get incorrectly incremented. Because EOF has been reached on the file, feof() returns true, and the while loop ends.

This paragraph counter works fine on simple text but may produce unexpected results when presented with a long string of blank lines or a file without two consecutive linebreaks. These problems can be remedied with functions based on preg_split(). If the file is small and can be read into memory, use the pc_split_paragraphs() function shown in example below. This function returns an array containing each paragraph in the file.

pc_split_paragraphs()
function pc_split_paragraphs($file,$rs="\r?\n") {
    $text = join('',file($file));
    $matches = preg_split("/(.*?$rs)(?:$rs)+/s",$text,-1,
                          PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);
    return $matches;
}

The contents of the file are broken on two or more consecutive newlines and returned in the $matches array. The default record-separation regular expression, \r?\n, matches both Windows and Unix linebreaks. If the file is too big to read into memory at once, use the pc_split_paragraphs_largefile( )function shown in example below, which reads the file in 4K chunks.

pc_split_paragraphs_largefile()
function pc_split_paragraphs_largefile($file,$rs="\r?\n") {
    global $php_errormsg;

    $unmatched_text = '';
    $paragraphs = array();

    $fh = fopen($file,'r') or die($php_errormsg);

    while(! feof($fh)) {
        $s = fread($fh,4096) or die($php_errormsg);
        $text_to_split = $unmatched_text . $s;

        $matches = preg_split("/(.*?$rs)(?:$rs)+/s",$text_to_split,-1,
                              PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY);

        // if the last chunk doesn't end with two record separators, save it
         * to prepend to the next section that gets read 
        $last_match = $matches[count($matches)-1];
        if (! preg_match("/$rs$rs\$/",$last_match)) {
            $unmatched_text = $last_match;
            array_pop($matches);
        } else {
            $unmatched_text = '';
        }
        
        $paragraphs = array_merge($paragraphs,$matches);
    }
    
    // after reading all sections, if there is a final chunk that doesn't
     * end with the record separator, count it as a paragraph 
    if ($unmatched_text) {
        $paragraphs[] = $unmatched_text;
    }
    return $paragraphs;
}

This function uses the same regular expression as pc_split_paragraphs( ) to split the file into paragraphs. When it finds a paragraph end in a chunk read from the file, it saves the rest of the text in the chunk in $unmatched_text and prepends it to the next chunk read. This includes the unmatched text as the beginning of the next paragraph in the file.






Add Comment

* Required information
1000

Comments

No comments yet. Be the first!

Most Viewed Articles (in PHP )

PHP code to write to a CSV file from MySQL query

Different versions of PHP - History and evolution of PHP

PHP code to import from CSV file to MySQL

Encrypting files using GnuPG (GPG) via PHP

PHP Warning: Unknown(): Unable to load dynamic library '/usr/local/php4/lib/php/extensions/no-debug ......

Decrypting files using GnuPG (GPG) via PHP

Send push notifications using Expo tokens in PHP

A Basic Example using PHP in AWS (Amazon Web Services)

Count occurrences of a character in a String in PHP

Password must include both numeric and alphabetic characters - Magento

Error: Length parameter must be greater than 0

Reading word by word from a file in PHP

Parent: child process exited with status 3221225477 -- Restarting

Convert a hex string into a 32-bit IEEE 754 float number in PHP

Floating point precision in PHP

Latest Articles (in PHP)