Hi,

I use the following PHP code to add target=_blank to all the links in a HTML document.

$link = "/<(a)([^>]+)>/i";
$target = '<\\1\\2 target="_blank">';

$replace = preg_replace($link,$target,$string);

What I am trying to accomplish is to add target _blank to all links except for those who have anchors (e.g href="#anchorname").

Does anyone know how can I do this?

Thank you,

    $content='what you start with..';
    if(preg_match_all('/<a\s[^>]*href=[^# ]+[^>]*>/i',$content,$matches)){
       //loop through
       for($i=0;$i<=count($matches[0]);$i++){
          $new=preg_replace('/target=[^ ]+/i','',$matches[0][$i]);
          $new=rtrim($new,'>') . ' target="_blank"' . '>';
          //this should be exact
          str_replace($matches[0][$i],$new,$content); //done!
       }
    }
    echo $content;
    

    You might refine the regex concepts more but basically href=[# ] should only get hrefs without a # in them. However if the link is href="page#1.html" it would fail obviously - I don't have a way to negatively omit "# only after ? for a query string" that I know of

    Notice the \s after the <a also, a bit more broad, and how the two [>]* allow the href to be anywhere in the tag.

    Cheers

      $string="text containing links";
      
      $search = "/<(a.*?href=\"[^#])([^>]+)>/"; //Search for links that do not contain href="#
      $replace = '<\\1\\2 target="_blank">'; //Add target ="_blank"
      
      $new=preg_replace($search, $replace, $string);
      
      print $new;

      This script is tested. Works fine as well.

        There are plenty of problems with trying to use regular expressions to parse HTML. Even XML gets hairy and that's quite a bit more strict. I note that none of the offered solutions take into account the possibility that the target attribute might already have been applied, for example.

        sfullman wrote:

        /<a\s[>]href=[# ]+[>]>/i

        As well as the issues you already note, you'll want to put a couple of "\s*" around the "=" to catch any whitespace there (but, as it happens A url containing "page#1.html" would treat the '#' as a special character).

        meltmedown wrote:

        This script is tested. Works fine as well.

        You missed the possibility that the attribute might be single-quoted. You also missed the possibility of an <a> element not having an href attribute at all, as well as the other five elements with names starting with 'a' - one of which also has an "href" attribute.

        But - making the perhaps wild assumption that this HTML document really is an HTML document - it would seem to make sense to take advantage of that and not just treat it like a string of arbitrary characters:

        $html = DOMDocument::loadHTML($string);
        $xpath = new DOMXPath($html);
        foreach($xpath->query('//a[@href][not(starts-with(@href,"#"))]') as $link)
            $link->setAttribute('target','_blank');
        $string = $html->saveHTML();
        

        The XPath query specifies "all <a> elements from the document root down with an href attribute that doesn't start with '#'". Those are all looped over and their "target" attributes (added if necessary) set to "_blank".

          Write a Reply...