hi
i'm trying to open a remote document and search it for all the URL's (i want to spider my site remotely for a list of all page links).
anyways, i'm just starting out and new to regular expressions, below is what I have so far but its not returning the details like I need them.
any help is greatly appreciated.
my regexp is |href=\"(?!http🙂([\"' >]+)|i
$url = $_POST['search_page'];
$data = file_get_contents($url);
if (preg_match_all("|href=\"(?!http:)([^\"' >]+)|i", $data, $matches)) {
print_r($matches);
} else {
echo "No links on page.";
}
it returns:
href="search.php?article=343
and i would like it to return
search.php?article=343 (basicly " to " in any href link.)
optionaly i would like it to only include files with a certain extension.
any help is greatly appreciated.