There are plenty of problems with trying to use regular expressions to parse HTML. Even XML gets hairy and that's quite a bit more strict. I note that none of the offered solutions take into account the possibility that the target attribute might already have been applied, for example.
sfullman wrote:/<a\s[>]href=[# ]+[>]>/i
As well as the issues you already note, you'll want to put a couple of "\s*" around the "=" to catch any whitespace there (but, as it happens A url containing "page#1.html" would treat the '#' as a special character).
meltmedown wrote:This script is tested. Works fine as well.
You missed the possibility that the attribute might be single-quoted. You also missed the possibility of an <a> element not having an href attribute at all, as well as the other five elements with names starting with 'a' - one of which also has an "href" attribute.
But - making the perhaps wild assumption that this HTML document really is an HTML document - it would seem to make sense to take advantage of that and not just treat it like a string of arbitrary characters:
$html = DOMDocument::loadHTML($string);
$xpath = new DOMXPath($html);
foreach($xpath->query('//a[@href][not(starts-with(@href,"#"))]') as $link)
$link->setAttribute('target','_blank');
$string = $html->saveHTML();
The XPath query specifies "all <a> elements from the document root down with an href attribute that doesn't start with '#'". Those are all looped over and their "target" attributes (added if necessary) set to "_blank".