I'm working on an extensive site containing about 1000 PHP files. In addition to search pages and account pages and so on, a great many of these files accept numeric query string parameters:

http://example.com?id=1234

Which might load up a school named "Occidental College" or perhaps a Career such as "Computer Programmer - Applications".

We are considering trying to alter the structure of our site to present more user-friendly URLs like this instead:

http://example.com/careers/Computer_Programmer_-_Applications

.

I completely understand how we might use mod_rewrite and some database work to map that URL onto http://example.com?id=1234. However, I'm wondering a couple of things and would like some input.

1) If we don't change our internal links to a page (i.e., they still point to http://example.com?id=1234) then this doesn't really help our search engine ranking unless external sites somehow know to link to the new fancy URLs, right? The perhaps futile hope is that we don't have to go change all the links in our 1000 PHP files and that we might use some mod_rewrite trickery to gain some advantage here.

2) Suppose I type http://example.com?id=1234 into my browser. Is it possible to display the search-friendly URL (http://example.com/careers/Computer_Programmer-Applications) in the browser instead so that if someone copies it to an email they get the long fancy URL?

2a) Is it possible to efficiently redirect a search engine looking for http://example.com?id=1234 to http://example.com/careers/Computer_Programmer-Applications and thereby gain all the advantages of our supposedly search-engine-friendly URL? Would we need to send an HTTP response code (301? 303?). Will the search engine be smart enough to know that the search-friendly URL is the 'real' link when it encounters the old-style link in the HTML of our site or somebody else's site? Does it help our ranking at all? What is the best technique for this?

3) I've heard that Amazon went from the query string / id approach to search-friendly URLs by using mod_rewrite type technology and ignoring the middle search-friendly part of a URL and putting the id part at the very end of the URL like this book called "Choke" by Chuck-Palahniuk.
http://www.amazon.com/Choke-Chuck-Palahniuk/dp/0307388921/ref=sr_1_1?ie=UTF8&s=books&qid=1238107513&sr=1-1

This is what I think I should be shooting for and I understand how to accept the URL on our server and load up the right content, I do not understand how to capitalize on it to improve our search ranking. Thoughts?

4) Will this really help that much?

Any and all discussion of this is welcome and encouraged.

    sneakyimp;10908740 wrote:

    1) If we don't change our internal links to a page (i.e., they still point to http://example.com?id=1234) then this doesn't really help our search engine ranking unless external sites somehow know to link to the new fancy URLs, right? The perhaps futile hope is that we don't have to go change all the links in our 1000 PHP files and that we might use some mod_rewrite trickery to gain some advantage here.

    Correct, search engines can't really index links that don't exist anywhere. They must be listed somwhere in order to be indexed. If you have thousands of hard-coded (whether in a database or flat file) links, I certainly see your point. Personally, I don't think it's worth all the work to convert these links to seo-friendly links. Unless...the links themselves are rendered dynamically as some sort of CMS system. Because in that case you simply search on another database key to get your dynamic page (or the same key if you allow the number to be part of the URL.) Please note that, even then, you will take a hit in ranking as your website is re indexed because now google's index for the old url is blown away. I would recommend leaving your links as they are for now and redirecting to the new seo-friendly pattern (say over a few months.) Only then, would you change your links. Of course, both urls work to bring up the same page (important)

    Google and 301 Redirects.

    sneakyimp;10908740 wrote:

    2) Suppose I type http://example.com?id=1234 into my browser. Is it possible to display the search-friendly URL (http://example.com/careers/Computer_Programmer-Applications) in the browser instead so that if someone copies it to an email they get the long fancy URL?

    Yeah, this involves the redirect (See link above.) But other than with a redirect, no. In fact, with most mod_rewrites it's the opposite. The webserver gets the seo-friendly url and converts it internally to the id=1234 version. This makes it easy on you because you still reference the id via $_GET['id']. However, this implies that the 1234 be part of the URL. I would do this after the 1 to 2 months redirect transition I mentioned. In other words:

    http://example.com?id=1234 (301 redirect to...)

    http://example.com/1234 (or however you make the link)

    Apache rewrites (internally) http://example.com/1234 back to http://example.com?id=1234

    Later on, you just change all your links to this pattern (no redirects)

    http://example.com/1234 (iow, skip the first step.)

    sneakyimp;10908740 wrote:

    2a) Is it possible to efficiently redirect a search engine looking for http://example.com?id=1234 to http://example.com/careers/Computer_Programmer-Applications and thereby gain all the advantages of our supposedly search-engine-friendly URL? Would we need to send an HTTP response code (301? 303?). Will the search engine be smart enough to know that the search-friendly URL is the 'real' link when it encounters the old-style link in the HTML of our site or somebody else's site? Does it help our ranking at all? What is the best technique for this?

    Again, as you already read, google is okay with this, but I don't know for how long.

    Our company's SEO guru claims it does (I know he has Matt Cutts email address from SEO conventions and what not.) Personally, I am not sure.

    sneakyimp;10908740 wrote:

    3) I've heard that Amazon went from the query string / id approach to search-friendly URLs by using mod_rewrite type technology and ignoring the middle search-friendly part of a URL and putting the id part at the very end of the URL like this book called "Choke" by Chuck-Palahniuk.
    http://www.amazon.com/Choke-Chuck-Palahniuk/dp/0307388921/ref=sr_1_1?ie=UTF8&s=books&qid=1238107513&sr=1-1

    Personally, I don't think it matters much. That URL is ugly and I'm not sure what they gained by it. Part of the point is to make a more intelligible url. However, I do think it's fine to put a numeric key in the url (just as I think it's fine to use a key value pair if you don't mind the whole intelligible issue. Again, other folks think the friendly urls help.)

    sneakyimp;10908740 wrote:

    This is what I think I should be shooting for and I understand how to accept the URL on our server and load up the right content, I do not understand how to capitalize on it to improve our search ranking. Thoughts?

    4) Will this really help that much?

    Any and all discussion of this is welcome and encouraged.

      THANKS bretticus for reading my lengthy post and responding in kind.

      Our site is not a collection of hard-wired links. It's about 1000 php pages that generate lists of colleges/careers/majors, etc., resulting in hundreds of thousands of potential 'pages' that one might access. So it is a bit like a CMS or a forum.

      It would therefore be possible to write a few scripts to generate a new textual key from the title or description of various records and populate our db tables with them.

      bretticus wrote:

      I would recommend leaving your links as they are for now and redirecting to the new seo-friendly pattern (say over a few months.) Only then, would you change your links. Of course, both urls work to bring up the same page (important)

      So what you are saying is that if one of my pages (e.g., http://example.com/foo.html) links to another one of these query string pages (e.g., http:/example.com?id=1234) then I should not adjust that internal link, but i should do the following:
      1) adjust my site to handle not only the old URL style (http://example.com?id=1234) but also handle http://example.com/New_Friendly_URL where 'New_Friendly_URL' is the new text-based, SEO-friendly key in my db.
      2) Set up an internal 301 redirect so that requests for http://example.com?id=1234 are redirected to http://example.com/New_Friendly_URL. This would be useful for a slow adjustment of search engines and visitors would also see this in their browser, is that correct?
      3) After a few months, go in and start editing my PHP pages that generate these internal links so they don't reference id=1234 but rather use New_Friendly_URL style.

      bretticus wrote:

      That URL is ugly and I'm not sure what they gained by it. Part of the point is to make a more intelligible url. However, I do think it's fine to put a numeric key in the url (just as I think it's fine to use a key value pair if you don't mind the whole intelligible issue. Again, other folks think the friendly urls help.)

      Agreed that amazon URL is pretty ugly. I think the idea is that the numeric stuff is the end is what is actually meaningful to the servers (it is what they use to actually fetch the page) and the user-friendly part immediately following 'amazon.com' is for the benefit of visitors and search engines. It would make sense that the URL figures in PageRank -- especially if you are searching for a website by entering its domain. I'm just wondering how much. I have it from a friend at Amazon.com that the user-friendly part is not even used by any internal machinery.

      I too wonder what they gained by it. I can see when I use the site that if I search for that book then the fancy SEO-friendly URL shows up on the search results and I can make a link to it from my blog or whatever. This means that they at least adjusted their search page.

      It will be quite a task to update all these pages if we go through with it. Does this sound like a decent roadmap to completion?
      1) Generate new textual keys in the various databases of interest
      2) Set up some mod_rewrite rules to internally map the textual keys from path-style reference (e.g., http://example.com/textual_key) to query strings I can use (http://example.com/page.php?key=textual_key). MAKE SURE that the new key variable (e.g., 'text_id') is different than the old key variable (e.g., 'id') so we can tell which database column to search on.
      3) Go and alter all the relevant/affected pages so that they properly fetch data requested with the new textual key. Test them thoroughly.
      4) Go back alter relevant/affected pages so that the old ugly keys result in a 301 redirect to their new textual key equivalent.
      5) Wait a few months
      6) Go and alter all pages with internal links using old-style key to use the new textual keys.

        Wow, sounds like you have this thought out well. I would definitely give my blessing. I just had the thought though. You can leave the redirect system in place indefinitely because in a "few months" you'll change all your links to use the SEO-friendly URL and the vast majority of your web server hits will be direct SEO-friendly URL's anyway. Google, in theory, is supposed to upgrade the links so that it will be increasingly harder to find the old-style links.

        The amazon URL's make more sense with your explanation. It's certainly much easier to key off an auto-incremented primary key than a text key.

        Good luck!

          One thing ocurred to me regarding the amazon-style URLS. If you are changing your pages to use 301 redirects for OLD STYLE references, but you are still using the old numeric keys, then you will need to construct a new variable flag in your mod_rewrite rules to indicate that the reference is a new style one. E.g., you would redirect http://example.com/NEW_FRIENDLY_URL/1324 to http://example.com/page.php?id=1234&friendly=1.

          If $_GET['friendly'] is not 1, then you would need to redirect to the new-fangled style URL.

            I'd just make a note about question 4)

            Will this really help that much?.

            One of the reasons for preferring "SEO-friendly" URLs is that they go some way to hiding the internal machinery of the site (something that shouldn't be exposed) - otherwise, if you change the backend of the site and all the URLs change as a result, all the old ones get broken and (a) search engine indexes won't see you properly (b) people's bookmarks break, (c) URLs that ended up on some random page somewhere or scribbled on a bit of paper or in a magazine article or ad break. (This should be required reading for anyone planning their site's URL structure.)
            Another reason is that friendly URLs are much easier to write down and dictate to someone else.

            In short: "URLs: they're not just for search engines any more".

              12 days later

              OK gents (and any ladies lurking) I have some additional wonderings.

              Suppose I would like to try and implement a 302 redirect at the mod_rewrite phase? Is it possible to have a mod_rewrite that can access the database to determine what rewrite should happen?

              Consider this example. A user using the 'internets' enters a url:

              http://mydomain.com/some_file.php?id=1234

              Assuming I have a database somewhere that can associate 1234 with Some_SEO_Friendly_Name, can I configure apache to rewrite this to the following URL using mod_rewrite?

              http://mydomain.com/Some_SEO_Friendly_name/1234

              Then a separate rewrite rule maps that 'friendly' url onto this:

              http://mydomain.com/app/some_file.php?id=1234

              The mod_rewrite setup would know not to do any rewriting on that last url.

              From the mod rewrite examples I've seen, I have never noticed any tricks that will query a database and return a rewrite rule from a database. Is this possible?

              EDIT: needed clarification.

                Well it does look like it's possible with the RewriteMap directive. This tip suggests writing a daemon process in PHP to parse requests ...something like this:

                #apache directive:
                RewriteEngine On
                RewriteMap tryme prg:/home/trainee/website/andy
                RewriteRule (.*\.htm) ${tryme:$1}
                

                and the script, a daemon? sounds dodgy if you ask me -- especially if all my web traffic is going through it.

                #!/usr/local/bin/php
                <?php
                
                /* This example show how you can use a rewrite map
                written in PHP to provide a programmed response for
                mod_rewrite. If you want to rewrite URLs that are
                stored in a MySQL database, simply add in a
                mysql_connect before you enter the loop, and then
                do the appropriate query within the  loop! */
                
                /* Extra lines to add to httpd.conf:
                RewriteEngine On
                RewriteMap tryme prg:/home/trainee/website/andy
                RewriteRule (.*\.htm) ${tryme:$1}
                */
                
                set_time_limit(0); # forever program!
                $keyboard = fopen("php://stdin","r");
                while (1) {
                        $line = trim(fgets($keyboard));
                        if (preg_match('/^(.*)\.htm$/',$line,$igot)) {
                                print "$igot[1].html\n";
                        } else {
                                print "$line\n";
                        }
                }
                ?>
                
                  7 days later

                  Is it possible to send a 403 or 404 from a RewriteMap script? Or should I just send the URL of an error file?

                    OK I hope somebody might still be listening here. I've cooked up a pretty elaborate apache conf file that i call seo.conf that deals with only a portion of my site. It works with a PHP map script to turn boring ids into lovely seo-friendly text phrases.

                    It's doing a fine job rewriting this:

                    http://192.168.1.2:8888/education/db/ug/ug_3.php?id=166027

                    to this:

                    http://192.168.1.2:8888/schools/undergraduate_colleges/Harvard-University/academics/166027/0

                    However, I'm feeling quite shaky about this whole thing. Some questions:
                    1) My rewrite directives result in EVERY SINGLE REQUEST ON MY SITE going through NUMEROUS rewrite directives. E.g, a single request for /index.php results in 9 RewriteCond directives getting processed. If I continue down this vein, I can reasonably expect this number to rise to 50 or more. Surely this will hurt the performance of my site dramatically? What is a reasonable number of RewriteCond directives to be processed for a single page request?

                    2) The last directive in my seo.conf file (see below) is there to remove a bit of query string that relates to a session id. Basically my site (some legacy code) will append a session id to nearly every url in an effort to propagate the sid should cookies be turned off. My redirect map program, in trying to honor and preserve that functionality, will append the sid after the last slash in the SEO-friendly url. If cookies are on, this results in sid=0 which could wipe a user's session so I must strip that off. If anyone sees a better way to handle this, I would appreciate knowing about it.

                    3) Can one send a 404 from the map program?

                    4) What's the story with RewriteLock? The documentation is rather tightlipped about the need for it. Am I to understand that if I use a PHP script (or any other program) as a rewrite map that I need a RewriteLock file? What are the required permissions on this file? Can I assume that apache needs to read/write it and will maintain anything it might contain?

                    5) is PHP a poor choice for this application? If my map daemon crashes, won't part of my site will go dark? Would I be better off writing this in C or something?

                    Any help would be MUCH appreciated.

                    Behold seo.conf:

                    RewriteEngine on
                    RewriteOptions MaxRedirects=5
                    RewriteLog /Applications/MAMP/htdocs/rewrite.log
                    RewriteLogLevel 9
                    
                    RewriteMap seo prg:/Applications/MAMP/htdocs/map.php
                    
                    #map requests for the original file to the new SEO friendly urls
                    RewriteCond &#37;{REQUEST_FILENAME} ^/education/db/ug/ug
                    RewriteCond %{DOCUMENT_ROOT}%{REQUEST_FILENAME} -f
                    RewriteCond %{QUERY_STRING} ^(.+)$
                    RewriteRule ^/education/db/ug/(ug.*)$ ${seo:$1?%1}? [L,R=301]
                    
                    
                    # map general-info back to ug_1.php
                    RewriteCond %{REQUEST_FILENAME} ^/schools/undergraduate_colleges/[^/]+/general-info/([^/]+)/(.+)$
                    RewriteRule .+ /education/db/ug/ug_1.php?id=%1&sid=%2
                    
                    # map campus-and-students back to ug_2.php
                    RewriteCond %{REQUEST_FILENAME} ^/schools/undergraduate_colleges/[^/]+/campus-and-students/([^/]+)/(.+)$
                    RewriteRule .+ /education/db/ug/ug_2.php?id=%1&sid=%2
                    
                    # map academics back to ug_3.php
                    RewriteCond %{REQUEST_FILENAME} ^/schools/undergraduate_colleges/[^/]+/academics/([^/]+)/(.+)$
                    RewriteRule .+ /education/db/ug/ug_3.php?id=%1&sid=%2
                    
                    # map cost-and-aid back to ug_4.php
                    RewriteCond %{REQUEST_FILENAME} ^/schools/undergraduate_colleges/[^/]+/cost-and-aid/([^/]+)/(.+)$
                    RewriteRule .+ /education/db/ug/ug_4.php?id=%1&sid=%2
                    
                    # map admissions back to ug_5.php
                    RewriteCond %{REQUEST_FILENAME} ^/schools/undergraduate_colleges/[^/]+/admissions/([^/]+)/(.+)$
                    RewriteRule .+ /education/db/ug/ug_5.php?id=%1&sid=%2
                    
                    # map articles back to ug_6.php
                    RewriteCond %{REQUEST_FILENAME} ^/schools/undergraduate_colleges/[^/]+/articles/([^/]+)/(.+)$
                    RewriteRule .+ /education/db/ug/ug_6.php?id=%1&sid=%2
                    
                    # map general-info back to ug_7.php
                    RewriteCond %{REQUEST_FILENAME} ^/schools/undergraduate_colleges/[^/]+/community/([^/]+)/(.+)$
                    RewriteRule .+ /education/db/ug/ug_7.php?id=%1&sid=%2
                    
                    
                    
                    # remove emtpy SID from the query string
                    RewriteCond %{QUERY_STRING} ^(.+)&sid=0$
                    RewriteRule ^(.+)$ $1?%1
                    

                    You will note that it makes reference to a MapRewrite script called map.php. Here's that script:

                    #!/Applications/MAMP/bin/php5/bin/php
                    <?php
                    
                    error_reporting(E_ALL);
                    
                    set_time_limit(0); # forever program!
                    
                    define('THIS_DIR', dirname(realpath(__FILE__)));
                    define('LOG_FILE', THIS_DIR . DIRECTORY_SEPARATOR . 'rewrite_log.txt');
                    
                    file_put_contents(LOG_FILE, 'rewrite script starting ' . date('Y-m-d H:i:s') . "\n");
                    
                    
                    // database connect
                    #define('MYSQL_HOST', 'localhost');
                    define('MYSQL_HOST', ':/Applications/MAMP/tmp/mysql/mysql.sock');
                    define('MYSQL_DB', 'test');
                    define('MYSQL_USER', 'root');
                    define('MYSQL_PASSWORD', 'root');
                    
                    define('SCHOOL_TABLE', 'live_schools');
                    
                    if (!($db = mysql_connect(MYSQL_HOST, MYSQL_USER, MYSQL_PASSWORD))) {
                    	die('could not connect');
                    }
                    if ($db === FALSE || !is_resource($db)) {
                    	write_log("Could not connect to the MySQL database");
                    	die();
                    } else {
                    	write_log("Connection to mysqldb successful");
                    }
                    
                    if (mysql_select_db(MYSQL_DB, $db)) {
                    	write_log("Selection of database successful");
                    } else {
                    	write_log("Could not select db for college rewrites");
                    	die();
                    }
                    
                    $sid_pattern = '#(&|/?)sid=(.*)(&|$)#i';
                    $pattern = '#(.+)\.php\?id=(\d+)($|&)#';
                    
                    $keyboard = fopen("php://stdin","r");
                    while (1) {
                    	$line = trim(fgets($keyboard));
                    	write_log('line:' . $line);
                    
                    // extract $sid if any
                    $matches = NULL;
                    if (preg_match($sid_pattern, $line, $matches)) {
                    	$sid = $matches[2];
                    } else {
                    	$sid = 0;
                    }
                    
                    
                    $found = FALSE;
                    
                    // try to grab the id from it
                    $matches = NULL;
                    if (preg_match($pattern, $line, $matches)) {
                    write_log(print_r($matches, true));
                    		$file = get_seo_friendly_file($matches[1]);
                    write_log('returned ' . $file);
                    		$id = $matches[2];
                    write_log('id=' . $id);
                    		if (($file != '') && ($id != '')) {
                    			$sql = "SELECT name FROM " . SCHOOL_TABLE . " WHERE unitid=" . strval($id);
                    			$res = @mysql_query($sql, $db);
                    			if ($res) {
                    				if (mysql_num_rows($res) >= 1) {
                    					if ($row = mysql_fetch_assoc($res)) {
                    						# output the new url	
                    						$url = '/schools/undergraduate_colleges/' . preg_replace('#[^a-z]+#i', '-', $row['name']) . '/' . $file . '/' . $id . '/' . $sid;
                    						$found = TRUE;
                    						print $url . "\n";
                    					} else {
                    						write_log('college rewrite query failed to fetch a row:' . print_r($row, true));
                    					}
                    				} else {
                    					write_log('college rewrite rows returned is less than 1');
                    				}
                    				mysql_free_result($res);
                    			} else {
                    				write_log('college rewrite query returned no resource:' . mysql_error(). "\n" . $sql);
                    			}
                    		} else {
                    			write_log('file or college id blank');
                    		}
                    	} else {
                    		write_log('no match for id pattern in college rewrite');
                    	}
                    	if (!$found) {
                    		return_404();
                    	}
                    }
                    
                    function write_log($msg) {
                    	file_put_contents(LOG_FILE, $msg . "\n", FILE_APPEND); 
                    }
                    
                    function return_404() {
                    	print "/404.php\n";
                    }
                    
                    function get_seo_friendly_file($file) {
                    write_log('running seo friendly on ' . $file);
                    
                    switch($file) {
                    	case 'ug_1':
                    		return 'general-info';
                    		break;
                    	case 'ug_2':
                    		return 'campus-and-students';
                    		break;
                    	case 'ug_3':
                    		return 'academics';
                    		break;
                    	case 'ug_4':
                    		return 'cost-and-aid';
                    		break;
                    	case 'ug_5':
                    		return 'admissions';
                    		break;
                    	case 'ug_6':
                    		return 'articles';
                    		break;
                    	case 'ug_7':
                    		return 'community';
                    		break;
                    	default:
                    		return '';
                    }
                    }
                    ?>
                    

                      change a mass amount of links in multiple files would be easy with the right IDE.. I can replace code in multiple files at once.

                      replace
                      index.php?action=1&action=2

                      with
                      /action1/action2/

                      What's so hard about that

                        That might work fine for hard-wired links but not so great when every single link is the result of a different query. For instance, one file looks like this:

                        <a href="ug_1.php?id="<?=append_sid($related_record['school_id']) ?>

                        another may look like this:

                        <a href="ug_1.php?id="<?=append_sid($row['unit_id']) ?>

                        and in other cases:

                        <a href="ug_1.php?id="<?=append_sid($_GET['id']) ?>

                          Dreamweaver (and I'm sure others, but it's the IDE I'm referring too) allows you do to regular expression find/replace so that still wouldn't be an issue.

                            Do be so kind as to let me know what that regex might be? Keep in mind that append_sid will sometimes add a session id to the url and other times will not.

                              6 days later

                              This has turned out to be a disaster. I went through all the trouble of writing an initial rewrite to route PHP requests through a map to their SEO-friendly equivalent, I changed my apache conf files so that the SEO-friendly URLs are routed to the appropriate handlers. I went through a great deal of trouble to make my php RewriteMap robust and stable and notify me when things go wrong.

                              Only to find out that all of the links, images, and css references are broken.

                              I've managed to concoct rewrite rules to preserve images, css, and javascript, but I'm at a total loss when it comes to links to other php pages. The original links in these pages have relative links to a variety of locations (in some cases links like ../../../index.php, in other cases file.php).

                              I believe I would now have to write a series of rewrite rules which may or may not be able to make all these work and am wondering if I'm just wasting my time or if there is a good procedure or rule of thumb or something for fixing problems like this. This is kind of what I was worried about when I originally asked of SEO-friendly URLs are worth it.

                                Write a Reply...