[RESOLVED] preg_match help please

atokatim · Feb 16, 2007

i am attempting to strip the URL out of an <a href but i am having problems with some <a formats.

preg_match('|\shref=[\'"]([^\'"]+)["\']|i', $cut, $part);
echo $part[1];

//outputs: http://somesite.com

It works fine, but if I try to strip the url out of:

it returns nothing.

Any help would be greatly appreciated!

NogDog · Feb 16, 2007

Off-hand, I can't see why it wouldn't work in both cases. Is there any possibility there's something before the "href" that is not classified as a "\s" white-space? Maybe you could try using a "\b" word boundary there instead?

atokatim · Feb 16, 2007

i tried both \s and \b and still nothing

it only happens when the A HREF is not formated like this:

<a href="....

NogDog · Feb 17, 2007

Seems to work fine for me:

<pre>
<?php
$cut = <<<EOD
This is <a href="http://www.php.net/test1">a test</a>. It is
<a id='test2' title="title text" href='http://www.php.net/test2'>only</a>
a test.
This is <a target="_blank" class='class' href="local/test3">the end</a>.
Here is <a target="_blank" class="someclass" href="http://somesite.com">your example</a>.
EOD;
preg_match_all('|\shref=[\'"]([^\'"]+)["\']|i', $cut, $part);
print_r($part[1]);
?>
</pre>

And the output was:

Array
(
    [0] => http://www.php.net/test1
    [1] => http://www.php.net/test2
    [2] => local/test3
    [3] => http://somesite.com
)

atokatim · Feb 17, 2007

ok....i guess it would help if i posted my other code....

what i am trying to do is grab all of the links on the site that i specify and display them, and see how many times a certain link that i set is located on that site without displaying it with the other links.

<?php
$link=0;
$url = 'http://www.revolutionmyspace.com';
//$url = 'http://www.sparkletags.com';
$shorturl1 = explode(".", $url);
$shorturl = $shorturl1[1].".".$shorturl1[2];
$lines = file($url);
echo $shorturl."<br>";
foreach ($lines as $line_num => $line) {
	$cutline = explode("<", $line); // Make sure there are no more links on the same line ">"
	foreach($cutline as $cut){
		preg_match('|href=[\'"]([^\'"]+)["\']|i', $cut, $part);  // change back to $line if needed
		//var_dump($part);
		$href=$part[1];
		if($href=="") $href=$part[6];
		if($href != "" && stristr($href, $shorturl) === FALSE && $href != "" && stristr($href, "http") === FALSE){

		$stripped=explode("?PHPSESSID", $href);
		$stripped=explode("&PHPSESSID", $stripped[0]);
		$pagetype = explode(".", $stripped[0]);
		$type=$pagetype[1];

		if($type != "css"){
			echo $stripped[0]."<br>";
		}
	}
}
   if(stristr(htmlspecialchars($line), "http://www.sparkletags.com/index.php") !== FALSE){
	$link++;
   }
}
echo $link;

?>

madwormer2 · Feb 17, 2007

Nevermind, you covered what I was going to say already.

Weedpacket · Feb 17, 2007

I think a more reliable pattern would match the entire tag; something like

<a\s[^>]*?\bhref\s*=\s*(['"])(.+?)\\1[^>]*?>

(noting that the quotes need to match each other, so the match would be part[2] with the quote itself in part[1] and the entire tag in part[0]). It's also legal for there to be no space between an attribute value and the name of a following attribute: <a id="foo"href="bar">

[RESOLVED] preg_match help please

Aatokatim

NogDog

Aatokatim

NogDog

Aatokatim

Mmadwormer2

Weedpacket