Perl's Encoding::FixLatin equivalent in PHP
By: squeegee in PHP Tutorials on 2011-07-31
I think this is a reasonable port of Perl's Encoding::FixLatin by Grant McLean, which converts a string with mixed encodings (ASCII, ISO-8859-1, CP1252, and UTF-8) to UTF-8.
<?php function init_byte_map(){ global $byte_map; for($x=128;$x<256;++$x){ $byte_map[chr($x)]=utf8_encode(chr($x)); } $cp1252_map=array( "x80"=>"xE2x82xAC", // EURO SIGN "x82" => "xE2x80x9A", // SINGLE LOW-9 QUOTATION MARK "x83" => "xC6x92", // LATIN SMALL LETTER F WITH HOOK "x84" => "xE2x80x9E", // DOUBLE LOW-9 QUOTATION MARK "x85" => "xE2x80xA6", // HORIZONTAL ELLIPSIS "x86" => "xE2x80xA0", // DAGGER "x87" => "xE2x80xA1", // DOUBLE DAGGER "x88" => "xCBx86", // MODIFIER LETTER CIRCUMFLEX ACCENT "x89" => "xE2x80xB0", // PER MILLE SIGN "x8A" => "xC5xA0", // LATIN CAPITAL LETTER S WITH CARON "x8B" => "xE2x80xB9", // SINGLE LEFT-POINTING ANGLE QUOTATION MARK "x8C" => "xC5x92", // LATIN CAPITAL LIGATURE OE "x8E" => "xC5xBD", // LATIN CAPITAL LETTER Z WITH CARON "x91" => "xE2x80x98", // LEFT SINGLE QUOTATION MARK "x92" => "xE2x80x99", // RIGHT SINGLE QUOTATION MARK "x93" => "xE2x80x9C", // LEFT DOUBLE QUOTATION MARK "x94" => "xE2x80x9D", // RIGHT DOUBLE QUOTATION MARK "x95" => "xE2x80xA2", // BULLET "x96" => "xE2x80x93", // EN DASH "x97" => "xE2x80x94", // EM DASH "x98" => "xCBx9C", // SMALL TILDE "x99" => "xE2x84xA2", // TRADE MARK SIGN "x9A" => "xC5xA1", // LATIN SMALL LETTER S WITH CARON "x9B" => "xE2x80xBA", // SINGLE RIGHT-POINTING ANGLE QUOTATION MARK "x9C" => "xC5x93", // LATIN SMALL LIGATURE OE "x9E" => "xC5xBE", // LATIN SMALL LETTER Z WITH CARON "x9F" => "xC5xB8" // LATIN CAPITAL LETTER Y WITH DIAERESIS ); foreach($cp1252_map as $k=>$v){ $byte_map[$k]=$v; } } function fix_latin($instr){ if(mb_check_encoding($instr,'UTF-8'))return $instr; // no need for the rest if it's all valid UTF-8 already global $nibble_good_chars,$byte_map; $outstr=''; $char=''; $rest=''; while((strlen($instr))>0){ if(1==preg_match($nibble_good_chars,$input,$match)){ $char=$match[1]; $rest=$match[2]; $outstr.=$char; }elseif(1==preg_match('@^(.)(.*)$@s',$input,$match)){ $char=$match[1]; $rest=$match[2]; $outstr.=$byte_map[$char]; } $instr=$rest; } return $outstr; } $byte_map=array(); init_byte_map(); $ascii_char='[x00-x7F]'; $cont_byte='[x80-xBF]'; $utf8_2='[xC0-xDF]'.$cont_byte; $utf8_3='[xE0-xEF]'.$cont_byte.'{2}'; $utf8_4='[xF0-xF7]'.$cont_byte.'{3}'; $utf8_5='[xF8-xFB]'.$cont_byte.'{4}'; $nibble_good_chars = "@^($ascii_char+|$utf8_2|$utf8_3|$utf8_4|$utf8_5)(.*)$@s"; ?>
Then just call fix_latin wherever you need it.
Add Comment
This policy contains information about your privacy. By posting, you are declaring that you understand this policy:
- Your name, rating, website address, town, country, state and comment will be publicly displayed if entered.
- Aside from the data entered into these form fields, other stored data about your comment will include:
- Your IP address (not displayed)
- The time/date of your submission (displayed)
- Your email address will not be shared. It is collected for only two reasons:
- Administrative purposes, should a need to contact you arise.
- To inform you of new comments, should you subscribe to receive notifications.
- A cookie may be set on your computer. This is used to remember your inputs. It will expire by itself.
This policy is subject to change at any time and without notice.
These terms and conditions contain rules about posting comments. By submitting a comment, you are declaring that you agree with these rules:
- Although the administrator will attempt to moderate comments, it is impossible for every comment to have been moderated at any given time.
- You acknowledge that all comments express the views and opinions of the original author and not those of the administrator.
- You agree not to post any material which is knowingly false, obscene, hateful, threatening, harassing or invasive of a person's privacy.
- The administrator has the right to edit, move or remove any comment for any reason and without notice.
Failure to comply with these rules may result in being banned from submitting further comments.
These terms and conditions are subject to change at any time and without notice.
- Data Science
- Android
- React Native
- AJAX
- ASP.net
- C
- C++
- C#
- Cocoa
- Cloud Computing
- HTML5
- Java
- Javascript
- JSF
- JSP
- J2ME
- Java Beans
- EJB
- JDBC
- Linux
- Mac OS X
- iPhone
- MySQL
- Office 365
- Perl
- PHP
- Python
- Ruby
- VB.net
- Hibernate
- Struts
- SAP
- Trends
- Tech Reviews
- WebServices
- XML
- Certification
- Interview
categories
Related Tutorials
Send push notifications using Expo tokens in PHP
PHP convert string to lower case
A Basic Example using PHP in AWS (Amazon Web Services)
Different versions of PHP - History and evolution of PHP
PHP code to write to a CSV file for Microsoft Applications
PHP code to write to a CSV file from MySQL query
PHP code to import from CSV file to MySQL
Password must include both numeric and alphabetic characters - Magento
Resume or Pause File Uploads in PHP
PHP file upload prompts authentication for anonymous users
PHP file upload with IIS on windows XP/2000 etc
Comments