Hi guys,

I use the bellow code to replace all the relative links in a HTML file with the absolute ones:

$string="
<a href="link.htm">1</a>
<a href="http://www.site.com/link.htm">2</a>
<a href="#link">3</a>
<a href="mailto:mail@example.com">4</a>
";

$link_noabs = "/<(a.*?href=\")([^http^#])([^>]+)>/";
$link_abs = '<\\1http://www.example.com/folder/\\2\\3 >';
$replace=preg_replace($link_noabs, $link_abs, $string);

The results are shown bellow:

<a href="http://www.example.com/folder/link.htm" target="_blank">1</a>
<a href="http://www.site.com/link.htm" target="_blank" >2</a>
<a href="#link" >3</a>
<a href="http://www.example.com/folder/mailto:mail@example.com" target="_blank" >4</a>

My problem is here:

$link_noabs = "/<(a.*?href=\")([^http^#])([^>]+)>/";

The REGEX above will search for all the links that do not begin with http and/or #. I need a regex that will search for all the links that do not begin with http, # and mailto: because, otherwise, the result will look like this:

<a href="http://www.example.com/folder/mailto:mail@example.com" target="_blank" >4</a>

I've tried the following code:

$link_noabs = "/<(a.*?href=\")([^http^#^mailto])([^>]+)>/";

but with no result.

Please help!

Thank you,

    Here's how I would tackle this:

    $str=' 
    <a href="link.htm">1</a> 
    <a href="http://www.site.com/link.htm">2</a> 
    <a href="#link">3</a> 
    <a href="mailto:mail@example.com">4</a> 
    ';
    
    $arr = preg_split('#(<a\b[^>]+>)#', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
    foreach($arr as &$val){
       if(preg_match('#<a\b#', $val) && !preg_match('~(?:http|#|mailto)~', $val)){
          $val = preg_replace('#^([^"]+")([^"]+)#', '$1'.'http://www.example.com/folder/'.'$2', $val);
       }
    }
    $arr = implode('', $arr);
    echo $arr;
    

    Ouput (when viewed as source code):

    <a href="http://www.example.com/folder/link.htm">1</a> 
    <a href="http://www.site.com/link.htm">2</a> 
    <a href="#link">3</a> 
    <a href="mailto:mail@example.com">4</a> 
    

    I think a big problem with what you have is this: [http#] in your pattern...
    This is a negated character class, which basically says, any character that is not an h, nor a t, nor a p, nor a carot, nor a hash... but this doesn't work out as planned..as only the first carot () at the beginning acts a negative.. everything else is simply a list of characters that are not perimissable.

    Another problem is perhaps trying to pack it all into regex.. while it can be done, I am more of a fan in mixing regex with addition (and often faster) functionality.. you can still get the same results as one done in pure regex, but often with quicker execution.

    EDIT - Come to think of it, we can get rid of the first conditional preg_match of the if statement within the foreach loop and replace with strpos instead...

    if(strpos($val, '<a ') !== false && !preg_match('~(?:http|#|mailto)~', $val)){
    

      Perfect. Please don't forget to flag this thread as resolved (top menu, Thread Tools > Mark Thread Resolved).

      Cheers

        Write a Reply...