REGEX/POSIX Gurus only apply within...

frigidman

I am trying to take a user inputted mass of text, and strip out ONLY the following html tags:

1) <HTML...>
2)
3) <HEAD....</HEAD>
4) <BODY...>
6) </BODY>
7) <SCRIPT...</SCRIPT>

Thus leaving the 'middle' of an html page and still allowing html tags other than the ones above. strip_tags() is not a solution. Below is what I used with another language, but I cannot figure out why it wont work with PHP???

	//lets get the data and mess with it, shall we?
	$cleandesc = $description;
	// kill the  tags
	$cleandesc = eregi_replace("/<html[^>]*>\s*\n*/i",'',$cleandesc);
	$cleandesc = eregi_replace("/<\/html[^>]*>\s*\n*/i",'',$cleandesc);

	// kill the <head> section
	$cleandesc = eregi_replace("/<head>[\s\S]*<\/head>\s*\n*/i",'',$cleandesc);
	$cleandesc = eregi_replace("/<head>[\s\S]*<\/HEAD>\s*\n*/i",'',$cleandesc);

	// kill any <script> section
	$cleandesc = st.replace("/<script[^>]*>[\s\S]*<\/script[^>]*>\n*/",'',$cleandesc);

	// kill the body tags
	$cleandesc = eregi_replace("/<body[^>]*>\s*\n*/i",'',$cleandesc);
	$cleandesc = eregi_replace("/<\/body[^>]*>\s*\n*/i",'',$cleandesc);

	// put back the modified data
	$description = $cleandesc;

"nothing" changes... Not a single char in a sample imput. Bugs me out? I gots something wrong, but apparently its not something wrong enough to cause a parse error, as the script runs fine.

Any clues?

deception54

Why don't you try something like:

$cleandesc .= preg_replace("/<html[^{>]>\s\n*/i","",$cleandesc);}

And repeat it for all your tags?

frigidman

With that dot there before the equals... it would create quite a mess ;-) Because it would take itself, attempt to replace things, then tack all that result onto itself again. Over and over. You would end up with a big ball of ... the same thing repeated about 10 times ;-)

The problem I am having is in the 'expression' itself. The part where it does the match. The "/<html[^>]>\s\n*/i" junk. Somewhere in that ball of mess, is something not kosher with PHP, and thus doesnt find what it should.

Deception54 wrote:

Why don't you try something like:

$cleandesc .= preg_replace("/<html[^{>]>\s\n*/i","",$cleandesc);}

And repeat it for all your tags?

deception54

Oops! Sorry! Big mistake on my side!

I tought you only had problem with the function name, and not with the regular expression.

I'm really not a guru! heheheh!