Problem:
I'm writing a little script that translates pages to morse. The plan is to isolate any HTML, CSS and scripts, translate the texts and than put it all back together.
It's working quite well, but at some pages the regexps start acting funny.
The following piece of code is a small exerpt of what happens.
The first piece of code works like it should, but the same doesn't work when $script contains a full webpage.
not working example
working example
the same function working very well on a tiny page
Seems like some kind of maximum string length for handling regexps. I searched, but I can't find it in any docs.
Does anyone know a solution? Does anyone know the exact maximum?
Should I cut the page into pieces? What if I cut a HTML tag? (I want to keep those intact)
Thanks!!
<?
// this works
$string = '<script LANGUAGE="JavaScript">
blah!
</script> etcetera...' ;
if( eregi( '^.*</script[^>]*>', $string, $res ) ){
echo $res[0];
}else{
// geen match
echo "Why hast thou forsaken me?!?";
}
// this doesn't
$array = file( 'http://www.zend.com/manual/', 'r');
$string = implode( '', $array );
if( eregi( '^.*</script[^>]*>', $string, $res ) ){
echo $res[0];
}else{
// geen match
echo "Why hast thou forsaken me?!?";
}
?>