I found this post in the manual and it sounds to do what I want, but I can't get it to work :
Latin1 (iso-8859-1) DONT define chars \x80-\x9f (128-159),
but Windows charset 1252 defines some of them
-- like the infamous msoffice 'magic quotes' (\x92 146).
Dont use those invalid control chars in webpages,
but their html (unicode) entities. See ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
or http://www.microsoft.com/typography/unicode/1252.htm
PS: a '?' in the code means the win-cp1252 dont define the given char.
$badlatin1_cp1252_to_htmlent =
array(
'\x80'=>'€', '\x81'=>'?', '\x82'=>'‚', '\x83'=>'ƒ',
'\x84'=>'„', '\x85'=>'…', '\x86'=>'†', '\x87'=>'‡',
'\x88'=>'ˆ', '\x89'=>'‰', '\x8A'=>'Š', '\x8B'=>'‹',
'\x8C'=>'Œ', '\x8D'=>'?', '\x8E'=>'Ž', '\x8F'=>'?',
'\x90'=>'?', '\x91'=>'‘', '\x92'=>'’', '\x93'=>'“',
'\x94'=>'”', '\x95'=>'•', '\x96'=>'–', '\x97'=>'—',
'\x98'=>'˜', '\x99'=>'™', '\x9A'=>'š', '\x9B'=>'›',
'\x9C'=>'œ', '\x9D'=>'?', '\x9E'=>'ž', '\x9F'=>'Ÿ'
);
$str = strtr($str, $badlatin1_cp1252_to_htmlent);
Which I use like this :
$ft = 'test.txt';
$handle = fopen($ft, "rb");
$tc = fread($handle, filesize($ft));
fclose($handle);
$str2 = strtr($tc, $badlatin1_cp1252_to_htmlent);
print $str2;
The frustrating thing is, it worked for 5 seconds or so (I made it all 1 line) and then I thought "Hey, I wonder if I can get the function to be more readable by adding a few 'enters' here and there (after comma's)", to find out it no longer reads it properly. A couple of undo's later oughto bring it back to the original state, but no. It won't work anymore. 🙁
Anyone has an idea what's going on ?