[RESOLVED] Add target _blank excepting href=#

meltmedown · Jan 9, 2009

Hi,

I use the following PHP code to add target=_blank to all the links in a HTML document.

$link = "/<(a)([^>]+)>/i";
$target = '<\\1\\2 target="_blank">';

$replace = preg_replace($link,$target,$string);

What I am trying to accomplish is to add target _blank to all links except for those who have anchors (e.g href="#anchorname").

Does anyone know how can I do this?

Thank you,

sfullman · Jan 10, 2009

$content='what you start with..';
if(preg_match_all('/<a\s[^>]*href=[^# ]+[^>]*>/i',$content,$matches)){
   //loop through
   for($i=0;$i<=count($matches[0]);$i++){
      $new=preg_replace('/target=[^ ]+/i','',$matches[0][$i]);
      $new=rtrim($new,'>') . ' target="_blank"' . '>';
      //this should be exact
      str_replace($matches[0][$i],$new,$content); //done!
   }
}
echo $content;

You might refine the regex concepts more but basically href=[^# ] should only get hrefs without a # in them. However if the link is href="page#1.html" it would fail obviously - I don't have a way to negatively omit "# only after ? for a query string" that I know of

Notice the \s after the <a also, a bit more broad, and how the two [^>]* allow the href to be anywhere in the tag.

Cheers

meltmedown · Jan 12, 2009

Thank you very much! Works great

meltmedown · Jan 12, 2009

$string="text containing links";

$search = "/<(a.*?href=\"[^#])([^>]+)>/"; //Search for links that do not contain href="#
$replace = '<\\1\\2 target="_blank">'; //Add target ="_blank"

$new=preg_replace($search, $replace, $string);

print $new;

This script is tested. Works fine as well.

Weedpacket · Jan 12, 2009

There are plenty of problems with trying to use regular expressions to parse HTML. Even XML gets hairy and that's quite a bit more strict. I note that none of the offered solutions take into account the possibility that the target attribute might already have been applied, for example.

sfullman wrote:
/<a\s[^{>]href=[^#} ]+[^>]>/i

As well as the issues you already note, you'll want to put a couple of "\s*" around the "=" to catch any whitespace there (but, as it happens A url containing "page#1.html" would treat the '#' as a special character).

meltmedown wrote:
This script is tested. Works fine as well.

You missed the possibility that the attribute might be single-quoted. You also missed the possibility of an <a> element not having an href attribute at all, as well as the other five elements with names starting with 'a' - one of which also has an "href" attribute.

But - making the perhaps wild assumption that this HTML document really is an HTML document - it would seem to make sense to take advantage of that and not just treat it like a string of arbitrary characters:

$html = DOMDocument::loadHTML($string);
$xpath = new DOMXPath($html);
foreach($xpath->query('//a[@href][not(starts-with(@href,"#"))]') as $link)
    $link->setAttribute('target','_blank');
$string = $html->saveHTML();

The XPath query specifies "all <a> elements from the document root down with an href attribute that doesn't start with '#'". Those are all looped over and their "target" attributes (added if necessary) set to "_blank".

[RESOLVED] Add target _blank excepting href=#

Mmeltmedown

Ssfullman

Mmeltmedown

Mmeltmedown

Weedpacket