I have a site which uses a few "virtual" urls to get the results I need. I usually monitor my site with a tool called Xenu and I've been really happy with it.
However in the creation of a new bit of code I'm calling BotSpotter I want to show the status that a Search Engine gets when it hits one of my pages.
I'm basically trying to store it against a page object property called status.
function getPageStatus()
{
global $HTTP_SERVER_VARS, $HTTP_ENV_VARS;
$this->status = '200*';
if (isset($HTTP_ENV_VARS["REDIRECT_STATUS"])) $this->status = $HTTP_ENV_VARS["REDIRECT_STATUS"];
else if (isset($HTTP_SERVER_VARS["REDIRECT_STATUS"])) $this->status = $HTTP_SERVER_VARS["REDIRECT_STATUS"];
}//function getPageStatus()
So I run Xenu and then look at my results page.
For my standard pages I get the 200* every time because there was no redirect so the $HTTP_SERVER_VARS["REDIRECT_STATUS"] variable was never created.
For the page "/", it the root, I get 200 and
for all my others I get 404.
Now, all the pages show, and all seem to be ok with Xenu. Is there some other variable or are the search engines getting the 404, thinking the page is broken and going away?
I'm not sure that that is right because Google has indexed & cached the page.
Any ideas on where the Search Engines get the status from?