I am already able to transparently follow the redirect. I am not able to know where I've been redirected to.
I am not able to use curl, it is not built into PHP by default which means it is not useful for this project. I am (sadly) limited to a default install of PHP because I want to be able to share my script with others and I don't feel comfortable requiring people to add components.
You guys were unable to really address the issue.
I can use either:
$handle = fopen($link, "rb");
$contents = "";
do {
$data = fread($handle, 8192);
if (strlen($data) == 0) {
break;
}
$contents .= $data;
} while (true);
fclose($handle);
echo $contents;
or
$contents = file_get_contents($link);
echo $contents;
In the middle I parse the links on the page and change every "a href" to point back to my script using myscript.php?link=[some link]. Then the $link variable points to the some link when the pages is refreshed. this loads the remote page at the end of the link.
For relative links, like <a href="/up/one/directory/">, I need to know the address of the current page being loaded. That is extremely easy for the first page because I chose it. But, in the case of yahoo (and probably numerous other sites) some relative links like "/s/42183" actually redirect to another site, like http://sports.yahoo.com/ncaab .
The tricky part comes in here. I am actually able to load the page with this script:
$link = "http://www.yahoo.com/s/42183/"
$contents = file_get_contents($link);
echo $contents;
As well as the fopen method. Those pages load fine. But the problem is I have no way to know that I was redirected and so I have no way of knowing how to handle relative links like <a href="/up/one/directory/">, on any subsequent pages.
I need to be able to write a function that can take a url and return the redirected link.
The trail has led me to think I need to mess with sockets and wrappers to read the headers returned when I access the remote page. But the fact that fopen and file_get_contents are able to easily follow the redirect that I could somehow force those functions to tell me where the actually got the data from.
I have no experience with the process needed to extract an http header from a remote request and I would like some information of where to start.
Any help would be fantastic.