Trying to extract every URL from a page, but not having much luck. Can get the page okay, but think there's an issue with my regexes when it comes to matching URLs -- I've tried searching for regex examples but they don't seem to work so well. Code follows:
$url = $_GET["url"];
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL,$url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);// allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_TIMEOUT, 20); // times out after 4s
//This gets the page
$result = curl_exec($ch); // run the whole process
// This is the part that doesn't seem to work
ereg('https?://([-\w.]+)+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?',$result,$eventurl);
When I print_r($eventurl) I get all the data from the first URL to the end of the page, all contained in the first array item.
Can anyone shed any light on this newbie-troubling issue?
Thanks!