can search engines parse php pages??? IE: Google, Yahoo

what I mean is, I have my index page as: index.php

this page uses then "include("mypage");" to create the whole thing

if I submit my site to search engines, will they pick up on the included material as to read the articles that are included?

I also read something about getting rid of the ? and the &.

Does this have to be done in order for pages to be indexed?

Thanks

    i had this problem, and i dont solve it yet...

    it looks that search engines ignore dinamic pages, dont ask me why...

    if someone knows a way to make google to index
    site.com/page.php?id=1 .. id=1000
    - PLEASE LET ME KNOW

    someone give me this idea:
    site.com/page.php/id/1
    or
    site.com/page.php/id/1.html

    it would resolve, but we will need to change some apache/php configuration to parse this urls on real ones - i dont know how to do that but i think it is quite simple.

    since it is easy to php to strip a fake url on a real one, the problem is just do apache send that fake urls to right place

    users and other sites (including search engines) will see and use fake urls, and your system would continue with right ones

    (i have problem to index www.webinsider.com.br on google)

      did you guys try using meta tags?

      if you go to the Philips Lighting site at www.lighting.philips.com

      it's all created in php, however we use ALOT of meta tags which helps search engines...

      arif

        The following code will redirect a "slashed" URL, such as http://blitzweb.org/show.php/id/1/s/2 into a "get" URL, such as http://blitzweb.org/index.php?id=1&s=2.

        Because this php code redirects TO the "index.php" file, this code CAN NOT BE the "index.php" file. You can cut and paste this PHP code into a new PHP file (look carefully above: "show.php" is what it is called in this example), and then re-link your website so all your links on your Index page direct you as follows:

        http://blitzweb.org/index.php?id=1&s=2 becomes http://blitzweb.org/show.php/id/1/s/2

        http://blitzweb.org/index.php?id=15&s=8 becomes http://blitzweb.org/show.php/id/15/s/8

        It takes a lot of work to re-link your website, but those links that you want Googlebot to index are the only ones you have to redirect using this technique.

        Googlebot will 'see' the links in the show.php format, which appears to be simply a deep directory tree. However, your website will redirect the show.php access to the index.php with the proper gets in place, via the redirect.

        <?php
        $base_url = 'http://blitzweb.org/index.php';
        $redirect = "";
        if ( $_SERVER['PATH_INFO'] != "" )
        {
        	$c = 0;
        	foreach( explode( "/", $_SERVER['PATH_INFO'] ) as $bit)
        	{
        		if ($bit != "")
        		{
        			if ($c == 0)
        			{
        				$c++;
        				$redirect .= $bit.'=';
        			}
        			else
        			{
        				$c = 0;
        				$redirect .= $bit.'&';
        			}
        		}
        	}
        }
        header("Location: $base_url?".$redirect);
        exit();
        ?>
        

        Britisch
        http://blitzweb.org

          Ok, so it is best just to write all my urls as:

          http://www.tripletproducts.com/this/that/that/this

          and then pares the url into what I need it to do to make the url:

          http://www.tripletproducts.com?this=that&that=this

          Is this what you are saying?

          1) That still confuses me on how a search engine knows how to follow links.

          If the link is: http://www.tripletproducts.com/this/that/that/this, then how does the search engine follow?

          2) When I submit my site to search engines...what should the url be that I submit? http://www.tripletproducts or http://www.tripletproducts.com/this/that/that/this ?

          3) Also, back to my first question... Can my main page be named index.php? and can I submit that url to the search enging? http://www.tripletproducts.com/index.php

          4) If my index.php makes use of the include("myfile.php"); function, will the search engine "build" the page, before indexing it.

          For example, my main page includes a page called "article.php" this article page has tons of information that is relevant to the site, and could help me get a better rating if it was included... does it happen?

          What about database queries?

          My menus are built with a mysql query. Will the titles of these menus be indexed to help me get a better rating?

          Any suggestions on how to get rated high on search engines such as Yahoo by using a php built site?

            If the link is: http://www.tripletproducts.com/this/that/that/this, then how does the search engine follow?

            The search engine will simply assume that /this/that/that/this is a directory and request that file from your web server.

            A search engine WILL NOT submit "GET" data, such as this=that&that=this.

            When I submit my site to search engines...what should the url be that I submit?

            Basically, submit whatever pages you want indexed. It's best to have an 'index.php' or 'index.html' file with links to the REST of your page; that way, when you submit your 'index.*' page, the Googlebot will simply follow the links you provided there and spider the rest of your page for you.

            BTW: When submitting to search engines, it's best to submit just your domain name (ie, Blitzweb.org). Your web server will automatically serve up your index page for you just like it would for any ordinary visitor. The Googlebot will find your links and take it from there automatically. You don't have to submit all your links by hand.

            Also, back to my first question... Can my main page be named index.php? and can I submit that url to the search engine? http://www.tripletproducts.com/index.php

            Yes, and yes.

            Your main page MUST be index.php or index.html; if you use anything else, your visitors will get a 404 error when they try access your web page using just your domain name!

            You would "submit" individual URL's using ROBOTS.TXT, but that's completely unrelated to the current issue - explore this link at your liesure.

            If my index.php makes use of the include("myfile.php"); function, will the search engine "build" the page, before indexing it.

            Your current "index.php" can be left alone, so long as it's default is to produce the main "home" page output (ie, no "id=1" or other parameters). Then, change all the LINKS GENERATED to point to "show.php"; ie, http://domain.com/show.php/id/1.

            Show.php, which is the code snippet I included, will redirect the web server BACK TO the index.php file, with the proper parameters to produce your desired output.

            1.) index.php :: "about" link points to "show.php/link/about"
            2.) show.php :: Receives a user's click (or Googlebot's click) for /link/about, sends server back to index.php?link=about

            The redirection which occurs in step #2 is COMPLETELY TRANSPARENT TO GOOGLEBOT! Your output will be generated by "index.php?link=about", but Googlebot (or any of your users, for that matter) will think they're seeing "show.php/link/about".

            I understand that this whole thread will be highly confusing. The best way to understand how it works is to create test files and see how it behaves. The key to understanding why you have to go through all this is to realize that variables passed as "GET" variables (ie, "link=news&view=summary" or whatever) WILL NOT BE USED BY GOOGLEBOT, so if your "index.php?link=news&view=summary" is your news link, Google will never see it, because it will never submit the {variable1}={value1}&{variable2}={value2} arguments to the web server.

            By converting these arguments to directory names tricks Google into thinking it's accessing "show.php/link/news/view/summary"; show.php translates this to index.php?link=news&view=summary on the SERVER side, so Google doesn't even know it's happening.

            Britisch
            Blitzweb.org

              WOW. Ok, so it's making a little more sense.

              but, with the "includes"......does the google bot actually "run" the scripts with the exception of the "GETS" to produces the complete page before indexing. I wasn't sure of you answer?

              If it does run the scripts then.....

              if i had a link on index.php pointing to http://www.mysite.com/show.php/article/023659

              and my show.php file was something that would...

              ereg_replace("/", "?", $_SERVER['REQUEST_URI']);
              and then go to the correct page.

              Then the googlebot would index article ??? Is this correct. Sorry for my ignorane. I'm reading that page you referenced in you "article" about web bots.

              So once I finish, I will have a better understanding of how the work.

              Thanks again

                From experience:

                I have an index.php page with meta tags pulled from a text file that is editable from a password protected area. Google, being a basic browser, requests the directory, Apache serves up the rendered index page and google reads it- perfectly.

                  Google seems to have no problems indexing or following dynamic pages or query strings. Although Google does limit the amount of dynamic pages it indexes. I gather that many search engines don't index dynamic pages with query strings or session IDs because of the danger that the spider/bot gets trapped in a loop.

                  We are able to index dynamically generated pages. However, because our web crawler can easily overwhelm and crash sites serving dynamic content, we limit the amount of dynamic pages we index.

                    if i had a link on index.php pointing to http://www.mysite.com/show.php/article/023659

                    and my show.php file was something that would...

                    ereg_replace("/", "?", $_SERVER['REQUEST_URI']);
                    and then go to the correct page.

                    CLOSE: In your example, your show.php file replaces ALL slashes with the ? character:

                    show.php/article/023659 becomes
                    index.php?article?023659

                    When what needs to happen is:

                    show.php/article/023659 becomes
                    index.php?article=023659

                    Basically, odd-numbered slashes become ?, and even-numbered slashes become =. This way, you can substitute all ?'s and ='s for /'s, and they'll be retranslated into ?'s and ='s in the appropriate places.

                    Keep in mind, Googlebot doesn't see this transformation: Googlebot sees, "show.php/article/023659", but your web server, because of the header redirection, actually delivers "index.php?article=023659". The whole idea is to pipe dynamic content through an artificial directory structure. "show.php/article/023659" is an artificial directory structure, which all bots should spider correctly.

                    Britisch
                    http://blitzweb.org

                      4 days later

                      um.. I watch and track web crawlers on my site.
                      they often follow dynamic pages.
                      and when was the last time you clicked on a link to a forum from google? all the time for me.

                        Originally posted by apepelis
                        um.. I watch and track web crawlers on my site.
                        they often follow dynamic pages.
                        and when was the last time you clicked on a link to a forum from google? all the time for me.

                        But why some dynamic pages was followed and indexed and not others?
                        I tryed robots.txt and metatags with no sucess..

                        some clue?

                          Write a Reply...