Cool, I've just done a similar thing (but a hell of a lot sloppier) by spliting the php and the html into two arrays. While doing this every time I reach a new html chunk I split the php that has just finished into code and strings. Doesn't handle \\" but I'm pretty sure that never happens in the site (I hope :p ).
OK, no laughing now. Here it is.
I'd like to use my method (purely out of pride...and because I've bloody worked on it for three hours already!!😉) so, do you have any comments?
<?php
//split php and html
function strip_code($string)
{
$len=strlen($string);
$intag=false;
$html;
$php;
for($i=0,$a=-1,$b=-1;$i<$len;$i++) {
if(!$intag && $string[$i]=='<' && $string[$i+1]=='?') {
$html[$a].=$string[$i++].$string[$i];
if(preg_match('/php/i',$string[$i+1].$string[$i+2].$string[$i+3]))
$html[$a].=$string[++$i].$string[++$i].$string[++$i];
$intag=true;
$b++;
} elseif($intag && $string[$i]=='?' && $string[$i+1]=='>') {
$php[$b].=$string[$i++];
$intag=false;
$php[$b]=just_strings($php[$b]);
$a++;
} elseif(!$intag) {
$html[$a].=$string[$i];
} else {
$php[$b].=$string[$i];
}
}
return array('html'=>$html,'php'=>$php);
}
//for php; split code and strings
function just_strings($string)
{
$quotes=array('"',"'");
$inquotes=array(false,false);
$qcount=count($quotes);
$len=strlen($string);
$code;
$quote;
for($i=0, $c=0, $q=0;$i<$len;$i++) {
if(!in_array(true,$inquotes) && !in_array($string[$i],$quotes)) {
$code[$c].=$string[$i];
} elseif(!in_array(true,$inquotes) && in_array($string[$i],$quotes)) {
$code[$c++].=$string[$i];
for($j=0;$j<$qcount;$j++) {
if($string[$i]==$quotes[$j]) {
$inquotes[$j]=true;
}
}
} elseif(in_array(true,$inquotes)) {
for($j=0;$j<$qcount;$j++) {
if($inquotes[$j]==true && $string[$i]==$quotes[$j] && ($string[$i-1]!="\\\\" || $string[$i-2]=="\\\\")) {
$code[$c].=$string[$i];
$q++;
$inquotes[$j]=false;
} elseif ($inquotes[$j]==true) {
$quote[$q].=$string[$i];
}
}
}
}
return array('code'=>$code, 'quote'=>$quote);
}
?>
Cheers
Bubble
PS
Just out of interest, how would one create a regex to eliminate the contents of php tags? At first it seemed easy but then you are allowed to use the ?> in a string and it is quite possible your would have to. Does this come back to lookaheads and lookebehinds?