I'm guessing it could be done with a preg_replace, involving suitable assertions about what characters may surround which characters. I may come back with one, but for now I've got another suggestion.
I'm assuming that the HTML is to spec - the only < and > characters in the file are those that begin and end tags. Things get really messy otherwise, and you end up like pnoeric writing a mini-parser.
//Using fread() with a ridiculously big size
// is not a good idea - PHP has to find and
// reserve that much memory before it can
// begin reading.
$string = join('',file($url)); // Either do this
// use a proper
// fread() loop.
$stringbits = preg_split('/[<>]/',$string);
What does that achieve? As it happens, all the even-numbered array entries contain the text outside tags, and all the odd-numbered entries contain text that is inside tags.
So do the conversion on all the even-numbered elements:
$len=count($stringbits);
for($i=0;$i<$len; $i+=2)
$stringbits[$i]=convert($stringbits[$i]);
Put <> back around the odd-numbered ones:
$len=count($stringbits);
for($i=1;$i<$len; $i+=2)
$stringbits[$i]='<'.$stringbits[$i].'>';
And reassemble the pieces:
$convertedstring = join('',$stringbits);
I think....