Hi,
I'm getting traffic to an "site/url" that I need redirect to another site, I have been using this code for that:

<?php
header("Location: http://www.???.com");
exit;
?>

How can I modify this so spiders that come to this url stay at my site and crawl it instead of following to ???.com ?

    That would be giving the search engines content that you are not displaying to your visitors.

    That means that you would be breaking the #1 rule in SEO for virtually all search engines and would get black listed as soon as they caught on. You would be, in a sense, cloaking.

    Just my 2 cents.

      unseen wrote:

      Hi,
      I'm getting traffic to an "site/url" that I need redirect to another site, I have been using this code for that:

      <?php
      header("Location: http://www.???.com");
      exit;
      ?>

      How can I modify this so spiders that come to this url stay at my site and crawl it instead of following to ???.com ?

      Is it really possible that spiders can read and follow the PHP header function ?

      I don't think so. They like html hrefs and anchors.

        You should use this:

        <?php
        header('HTTP/1.1 301 Moved Permanently');
        header('Location: http://www.youdomain.com/newurl');
        ?>
          Kudose wrote:

          That would be giving the search engines content that you are not displaying to your visitors.

          That means that you would be breaking the #1 rule in SEO for virtually all search engines and would get black listed as soon as they caught on. You would be, in a sense, cloaking.

          Just my 2 cents.

          Wouldn't he be doing the same thing as somebody adding "rel="nofollow" to a hyperlink? Yes.

            BobLennon wrote:

            Is it really possible that spiders can read and follow the PHP header function ?

            Of course they can read headers. It's basic HTTP protocol. If they can't do that then they can't crawl.

              Weedpacket wrote:

              Of course they can read headers. It's basic HTTP protocol. If they can't do that then they can't crawl.

              Sorry Weedpacket,

              While spiders use HTTP headers to actually crawl around, their primary job is to scan the HTML at a selected URL for HTML links and text content.

              They cannot interpret program script or code contained on the HTML page, and surely will not build a URL from an embedded php header function argument.

                Actually BobLennon, weedpacket is right - the spider would follow the 301 header before indexing the html. Besides, the spider wouldn't even see the php script, as php is server-side and robots 'see' the client-side output.

                harmor wrote:

                Wouldn't he be doing the same thing as somebody adding "rel="nofollow" to a hyperlink? Yes.

                No - because hyperlinks require somebody to click on them, so the person browsing would see the page before the link just as the spider would .

                  weekender wrote:

                  Actually BobLennon, weedpacket is right - the spider would follow the 301 header before indexing the html. Besides, the spider wouldn't even see the php script, as php is server-side and robots 'see' the client-side output.

                  Thanks weekender. I forgot that the page in question would be pre-processed by PHP even though the request GET did not come from a browser.

                  And maybe I'm reading the problem wrong, but the location header sent out by the PHP header function has to be sent before any HTML is sent to the browser (or the spider) . Seems to me that the spider would not get to see anything on the page that it is supposed to crawl before the redirection HTTP is issued.

                  Also, I didn't think that this kind of redirection would normally produce a 301 response from the target under these circumstances.

                  How does this really work?

                    That's exactly the point: the original poster wants to redirect the spider so that it crawls a different page from what it requested. As Kudose pointed out, this sort of carry-on is the sort of behaviour that gets sites blacklisted by search engines.

                      Weedpacket wrote:

                      That's exactly the point: the original poster wants to redirect the spider so that it crawls a different page from what it requested. As Kudose pointed out, this sort of carry-on is the sort of behaviour that gets sites blacklisted by search engines.

                      Hello Weedpacket,

                      This is not my month. I still don't understand your point.

                      I think it's not so much that the poster wants to redirect the spider, it's more likely that the page developer wants to redirect to another browser page under certain conditions. That's why output buffering is usually used in php, because any header function call on an HTML page has to be issued before any html code is sent out.

                      That certainly is not bad behavior, and has not resulted in any search engine blacklisting on the dozens of sites that I've written that employ these techniques.

                      The problem that I see in this is that if a spider picked a page from some seed list, and that page had a php location header function reference in it, the spider would never see the page at all if the conditions for executing the header call were present. If the conditions were not met, the spider would not be redirected and the page would be available for its scan.

                      However, if the page was constructed to do this same kind of redirection using client side Javascript for example, the spider would see all the text and links that it was intended to look at. Since it cannot understand Javascript, it would simply not look at the redirection code.

                      So if redirection is not a programming sin, I guess the problem is in the way php handles it (at least from the spiders viewpoint).

                        PHP doesn't handle redirection. Like I said, it's basic HTTP protocol.

                        The user wants non-spider traffic to be redirected to a different site, but wants the spiders to stay on the current site (on a different domain, even). For reasons that have thus far not been explained, the poster wants spiders to crawl a different site from what other clients see.

                        Hardly an occasion for handwaving about HTML content or output buffering or clientside Javascript or even PHP. It's just a matter of looking at the supplied user-agent header in the request and based on that deciding (read: guessing) whether to redirect the client to a different site or not.

                          You control spiders etc with a robots.txt file. Have a read about it at robotstxt.org . Google for more info. This use of robots.txt is an accepted standard and will NOT get you blacklisted by search engines.

                            Weedpacket wrote:

                            PHP doesn't handle redirection. Like I said, it's basic HTTP protocol....

                            The user wants non-spider traffic to be redirected to a different site, but wants the spiders to stay on the current site (on a different domain, even). For reasons that have thus far not been explained, the poster wants spiders to crawl a different site from what other clients see.

                            Hardly an occasion for handwaving about HTML content or output buffering or clientside Javascript or even PHP. It's just a matter of looking at the supplied user-agent header in the request and based on that deciding (read: guessing) whether to redirect the client to a different site or not.

                            Bottom line is that it's easy to keep the spider on the page with PHP if that's what you want to do. OTOH PHP makes it even easier to cloak the page. It's a developers choice.

                              Roger Ramjet wrote:

                              You control spiders etc with a robots.txt file. Have a read about it at robotstxt.org . Google for more info. This use of robots.txt is an accepted standard and will NOT get you blacklisted by search engines.

                              The GoogleBot makes control even more precise because it follows directions contained an XML sitemap.

                                Write a Reply...