I have a forum program (phpbb); in their last upgrade, they included security measures for HTML included in posts. One of these includes stripping out any value not enclosed in quotes. Example:
<a href="/page.htm">LINK</a>
works fine, but:
<a href=/page.htm>LINK</a>
becomes:
<a>LINK</a>
This not only causes problems in many well-intentioned posts, but also for people quoting or editing posts that were written before the upgrade, without using quotes. A better solution (in my mind), if unquoted values are indeed a security risk(???) would be to ADD QUOTES to any unquoted values. Thus:
<a href=/page.htm>LINK</a>
should become:
<a href="/page.htm">LINK</a>
Unfortunately, I don't have the php expertise to do that. I tried requesting help at phpbb.com, but got no response. Hoping someone can assist here.
This is the new code causing the problem:
// If HTML is enabled, we'll try to make it safe
// This approach is quite agressive and anything that does not look like a valid tag
// is going to get converted to HTML entities
$message = stripslashes($message);
$html_match = '#<[^\w<]*(\w+)((?:"[^"]*"|\'[^\']*\'|[^<>\'"])+)?>#';
$matches = array();
$message_split = preg_split($html_match, $message);
preg_match_all($html_match, $message, $matches);
$message = '';
foreach ($message_split as $part)
{
$tag = array(array_shift($matches[0]), array_shift($matches[1]), array_shift($matches[2]));
$message .= htmlspecialchars($part) . clean_html($tag);
}
$message = addslashes($message);
Any way to modify this to acheive the desired result? Or is it NOT something I want to do for security reasons? Maybe someone can offer a better solution.