Ok, well, you could just strip any non-word character out.
$pattern = '/[^a-z0-9\.;:\!\-, ]/gi'; //and the rest of the myriad of punctuation chars.
echo preg_replace($pattern, '', $contents);
This is untested and may throw errors. It sounds like you know what you are doing and can do any necessary debugging. This may help you get just the chars you want and drop the crazy stuff.
Surely, somebody will have a better solution. But maybe that'll be you and this will "inspire" you. 😃