Getting Rid of MS Smart Quotes
April 11, 2003 by daynah
Filed under Code Snippets
Ever have that problem with displaying Smart Quotes in the browser? Well, here is how I solved the little bug.
This problem has been bugging me for a while. See the image:
The problem is in the red circle. See how there’s junk around “Meet the Author?” Well, I was researching on the problem and it’s because the user copied and pasted from a Word Document. Microsoft adds these “Smart Quotes” to your documents which are just fancy close and open quotes. These replaces the straight quotes ( ” ) when the user enters the data.
So how to fix this? I did a lot of Googling today on smart quotes with php/html. I couldn’t find a good regular expression to replace the fancy quotes. I mean, what do I put in the parameters of eregi_replace()?
Here are some links that helped me:
Creating Special Characters
Smart Quotes:
Adding automated curly quotes to Cocoa’s Text system
chr()
ord()
htmlentities()
But the one that really helped me was the htmlentities() function. In the comments, a user posted this function:
function superhtmlentities($text)
{
$entities = array(128 => 'euro', 130 => 'sbquo',
131 => 'fnof', 132 => 'bdquo', 133 => 'hellip',
134 => 'dagger', 135 => 'Dagger', 136 => 'circ',
137 => 'permil', 138 => 'Scaron', 139 => 'lsaquo',
140 => 'OElig', 145 => 'lsquo', 146 => 'rsquo',
147 => 'ldquo', 148 => 'rdquo', 149 => 'bull',
150 => 'ndash', 151 => 'mdash', 152 => 'tilde',
153 => 'trade', 154 => 'scaron', 155 => 'rsaquo',
156 => 'oelig', 159 => 'Yuml');
$new_text = '';
for($i = 0; $i < strlen($text); $i++)
{
$num = ord($text{$i});
if(array_key_exists($num, $entities))
{
$new_text .= '&'.$entities[$num].';';
}
else if($num < 127 || $num > 159)
{
$new_text .= $text{$i};
}
}
return htmlentities($new_text);
}
This function converts all the evil (invalid) characters Microsoft Word could possibly use to HTML entities.
The only strange thing is that is printed out:
“Meet the Authorâ€
in the HTML code. But the wonderful thing is, I have something to at least run eregi_replace() on. :)
So my code looks like this:
$brief_des = eregi_replace("œ", '',
eregi_replace("â€", '"',
superhtmlentities($newsitem->body)));
This actually fixes my Smart Quotes problem by replacing them with regular ( ” ) and it deletes the extra uncessary characters. I’m still a little confused as to why superhtmlentities() returned what it did. But for now, I’m extremely happy to be able to remove those MS Smart Quotes!