PHP Redirect - Need Help!!!

unseen · Aug 17, 2006

Hi,
I'm getting traffic to an "site/url" that I need redirect to another site, I have been using this code for that:

<?php
header("Location: http://www.???.com");
exit;
?>

How can I modify this so spiders that come to this url stay at my site and crawl it instead of following to ???.com ?

Kudose · Aug 17, 2006

That would be giving the search engines content that you are not displaying to your visitors.

That means that you would be breaking the #1 rule in SEO for virtually all search engines and would get black listed as soon as they caught on. You would be, in a sense, cloaking.

Just my 2 cents.

BobLennon · Aug 17, 2006

unseen wrote:
Hi,
I'm getting traffic to an "site/url" that I need redirect to another site, I have been using this code for that:

<?php
header("Location: http://www.???.com");
exit;
?>

How can I modify this so spiders that come to this url stay at my site and crawl it instead of following to ???.com ?

Is it really possible that spiders can read and follow the PHP header function ?

I don't think so. They like html hrefs and anchors.

mjax · Aug 17, 2006

You should use this:

<?php
header('HTTP/1.1 301 Moved Permanently');
header('Location: http://www.youdomain.com/newurl');
?>

harmor · Aug 17, 2006

Kudose wrote:
That would be giving the search engines content that you are not displaying to your visitors.

That means that you would be breaking the #1 rule in SEO for virtually all search engines and would get black listed as soon as they caught on. You would be, in a sense, cloaking.

Just my 2 cents.

Wouldn't he be doing the same thing as somebody adding "rel="nofollow" to a hyperlink? Yes.

Weedpacket · Aug 17, 2006

BobLennon wrote:
Is it really possible that spiders can read and follow the PHP header function ?

Of course they can read headers. It's basic HTTP protocol. If they can't do that then they can't crawl.

BobLennon · Aug 17, 2006

Weedpacket wrote:
Of course they can read headers. It's basic HTTP protocol. If they can't do that then they can't crawl.

Sorry Weedpacket,

While spiders use HTTP headers to actually crawl around, their primary job is to scan the HTML at a selected URL for HTML links and text content.

They cannot interpret program script or code contained on the HTML page, and surely will not build a URL from an embedded php header function argument.

weekender · Aug 17, 2006

Actually BobLennon, weedpacket is right - the spider would follow the 301 header before indexing the html. Besides, the spider wouldn't even see the php script, as php is server-side and robots 'see' the client-side output.

harmor wrote:
Wouldn't he be doing the same thing as somebody adding "rel="nofollow" to a hyperlink? Yes.

No - because hyperlinks require somebody to click on them, so the person browsing would see the page before the link just as the spider would .

BobLennon · Aug 17, 2006

weekender wrote:
Actually BobLennon, weedpacket is right - the spider would follow the 301 header before indexing the html. Besides, the spider wouldn't even see the php script, as php is server-side and robots 'see' the client-side output.

Thanks weekender. I forgot that the page in question would be pre-processed by PHP even though the request GET did not come from a browser.

And maybe I'm reading the problem wrong, but the location header sent out by the PHP header function has to be sent before any HTML is sent to the browser (or the spider) . Seems to me that the spider would not get to see anything on the page that it is supposed to crawl before the redirection HTTP is issued.

Also, I didn't think that this kind of redirection would normally produce a 301 response from the target under these circumstances.

How does this really work?

Weedpacket · Aug 18, 2006

That's exactly the point: the original poster wants to redirect the spider so that it crawls a different page from what it requested. As Kudose pointed out, this sort of carry-on is the sort of behaviour that gets sites blacklisted by search engines.

BobLennon · Aug 18, 2006

Weedpacket wrote:
That's exactly the point: the original poster wants to redirect the spider so that it crawls a different page from what it requested. As Kudose pointed out, this sort of carry-on is the sort of behaviour that gets sites blacklisted by search engines.

Hello Weedpacket,

This is not my month. I still don't understand your point.

I think it's not so much that the poster wants to redirect the spider, it's more likely that the page developer wants to redirect to another browser page under certain conditions. That's why output buffering is usually used in php, because any header function call on an HTML page has to be issued before any html code is sent out.

That certainly is not bad behavior, and has not resulted in any search engine blacklisting on the dozens of sites that I've written that employ these techniques.

The problem that I see in this is that if a spider picked a page from some seed list, and that page had a php location header function reference in it, the spider would never see the page at all if the conditions for executing the header call were present. If the conditions were not met, the spider would not be redirected and the page would be available for its scan.

However, if the page was constructed to do this same kind of redirection using client side Javascript for example, the spider would see all the text and links that it was intended to look at. Since it cannot understand Javascript, it would simply not look at the redirection code.

So if redirection is not a programming sin, I guess the problem is in the way php handles it (at least from the spiders viewpoint).

Weedpacket · Aug 19, 2006

PHP doesn't handle redirection. Like I said, it's basic HTTP protocol.

The user wants non-spider traffic to be redirected to a different site, but wants the spiders to stay on the current site (on a different domain, even). For reasons that have thus far not been explained, the poster wants spiders to crawl a different site from what other clients see.

Hardly an occasion for handwaving about HTML content or output buffering or clientside Javascript or even PHP. It's just a matter of looking at the supplied user-agent header in the request and based on that deciding (read: guessing) whether to redirect the client to a different site or not.

Roger_Ramjet · Aug 19, 2006

You control spiders etc with a robots.txt file. Have a read about it at robotstxt.org . Google for more info. This use of robots.txt is an accepted standard and will NOT get you blacklisted by search engines.

BobLennon · Aug 19, 2006

Weedpacket wrote:
PHP doesn't handle redirection. Like I said, it's basic HTTP protocol....

The user wants non-spider traffic to be redirected to a different site, but wants the spiders to stay on the current site (on a different domain, even). For reasons that have thus far not been explained, the poster wants spiders to crawl a different site from what other clients see.

Hardly an occasion for handwaving about HTML content or output buffering or clientside Javascript or even PHP. It's just a matter of looking at the supplied user-agent header in the request and based on that deciding (read: guessing) whether to redirect the client to a different site or not.

Bottom line is that it's easy to keep the spider on the page with PHP if that's what you want to do. OTOH PHP makes it even easier to cloak the page. It's a developers choice.

BobLennon · Aug 19, 2006

Roger Ramjet wrote:
You control spiders etc with a robots.txt file. Have a read about it at robotstxt.org . Google for more info. This use of robots.txt is an accepted standard and will NOT get you blacklisted by search engines.

The GoogleBot makes control even more precise because it follows directions contained an XML sitemap.

8ta8ta · Aug 19, 2006

You can add a url rewrite rule in apache httpd.conf file to do so!

Guide: http://httpd.apache.org/docs/1.3/misc/rewriteguide.html

PHP Redirect - Need Help!!!

Uunseen

KKudose

BBobLennon

Mmjax

Hharmor

Weedpacket

BBobLennon

Wweekender

BBobLennon

Weedpacket

BBobLennon

Weedpacket

RRoger_Ramjet

BBobLennon

BBobLennon

88ta8ta