i am attempting to strip the URL out of an <a href but i am having problems with some <a formats.

<a href="http://somesite.com">

preg_match('|\shref=[\'"]([^\'"]+)["\']|i', $cut, $part);
echo $part[1];

//outputs: http://somesite.com

It works fine, but if I try to strip the url out of:

<a target="_blank" class="someclass" href="http://somesite.com">

it returns nothing.

Any help would be greatly appreciated!

    Off-hand, I can't see why it wouldn't work in both cases. Is there any possibility there's something before the "href" that is not classified as a "\s" white-space? Maybe you could try using a "\b" word boundary there instead?

      i tried both \s and \b and still nothing 🙁

      it only happens when the A HREF is not formated like this:

      <a href="....

        Seems to work fine for me:

        <pre>
        <?php
        $cut = <<<EOD
        This is <a href="http://www.php.net/test1">a test</a>. It is
        <a id='test2' title="title text" href='http://www.php.net/test2'>only</a>
        a test.
        This is <a target="_blank" class='class' href="local/test3">the end</a>.
        Here is <a target="_blank" class="someclass" href="http://somesite.com">your example</a>.
        EOD;
        preg_match_all('|\shref=[\'"]([^\'"]+)["\']|i', $cut, $part);
        print_r($part[1]);
        ?>
        </pre>
        

        And the output was:

        Array
        (
            [0] => http://www.php.net/test1
            [1] => http://www.php.net/test2
            [2] => local/test3
            [3] => http://somesite.com
        )
        

          ok....i guess it would help if i posted my other code....

          what i am trying to do is grab all of the links on the site that i specify and display them, and see how many times a certain link that i set is located on that site without displaying it with the other links.

          <?php
          $link=0;
          $url = 'http://www.revolutionmyspace.com';
          //$url = 'http://www.sparkletags.com';
          $shorturl1 = explode(".", $url);
          $shorturl = $shorturl1[1].".".$shorturl1[2];
          $lines = file($url);
          echo $shorturl."<br>";
          foreach ($lines as $line_num => $line) {
          	$cutline = explode("<", $line); // Make sure there are no more links on the same line ">"
          	foreach($cutline as $cut){
          		preg_match('|href=[\'"]([^\'"]+)["\']|i', $cut, $part);  // change back to $line if needed
          		//var_dump($part);
          		$href=$part[1];
          		if($href=="") $href=$part[6];
          		if($href != "" && stristr($href, $shorturl) === FALSE && $href != "" && stristr($href, "http") === FALSE){
          
          		$stripped=explode("?PHPSESSID", $href);
          		$stripped=explode("&PHPSESSID", $stripped[0]);
          		$pagetype = explode(".", $stripped[0]);
          		$type=$pagetype[1];
          
          		if($type != "css"){
          			echo $stripped[0]."<br>";
          		}
          	}
          }
             if(stristr(htmlspecialchars($line), "http://www.sparkletags.com/index.php") !== FALSE){
          	$link++;
             }
          }
          echo $link;
          
          ?>

            Nevermind, you covered what I was going to say already.

              I think a more reliable pattern would match the entire tag; something like

              <a\s[^>]*?\bhref\s*=\s*(['"])(.+?)\\1[^>]*?>

              (noting that the quotes need to match each other, so the match would be part[2] with the quote itself in part[1] and the entire tag in part[0]). It's also legal for there to be no space between an attribute value and the name of a following attribute: <a id="foo"href="bar">

                Write a Reply...