SEO-friendly URLs...what are the most important things?

sneakyimp

OK so I am again revisiting SEO. After much wrangling and wrestling with mod_rewrite, htaccess files, and some mysql nonsense, I believe I have the tools at my disposal to do some good rewriting.

So now I'm wondering what the main tasks are to rewrite crappy urls into good ones. Is the point just to get the keywords in the URL or are there other things that help? For instance, is it OK to have a .PHP extension or should one try to make them all appear to be .HTML? Also, are there good techniques for hiding query string parameters?

For instance, is this ok?

http://domain.com/majors/astronomy-and-astrophysics-other/description/40.0299/?extraParam=123456

Or is this better?

http://domain.com/majors/astronomy-and-astrophysics-description-40.0299.html?extraParam=123456

NogDog

The question I always ask whenever this comes up -- and which nobody has yet to answer -- is whether any search engine actually gives a hoot whether we use "search-engine-friendly URLs".

As to your specific question, I doubt that it makes any difference; and pending a conclusive answer to my question (with some empirical proof or documentation "from the horse's mouth," so to speak), it may not matter at all. 🙂

sneakyimp

You have an excellent point, nogdog. It's paying work so I'm happy to do it, but I sense a certain futility. I have seen no case studies about this which prove anything at all. I have noticed that wikipedia is quite high in all the rankings for various things I search for and they do it. That's hardly proof though.

NogDog

Wikipedia has page rank (IMO) because they have so many external links to their site. Just like this site has a very high search rank for any PHP-related search, and it most definitely does not use SEO-friendly URLs. 🙂

sneakyimp

You are probably right about the external links. And yes, PHPBuilder does have a high ranking for php stuff -- keep in mind it's got 'php' right in the URL. Try leaving 'php' out of your search and google for something distinctly php like $_SERVER and you won't find phpbuilder.com anywhere near the top.

I wonder if there's some kind of mathematical analysis of google results we can whip up to statistically correlate a page's ranking on google to the appearance of a search term in its url. Surely this is doable? Seems pretty easy to use cURL in PHP to fetch the first 100 results for a given search term and do some analysis.

mpb001

We were talking about this today at work and I believe it is a combination of domain, title, url, text content, alt tags etc. Actually I just checked one of the sites and the single word search beats wikipedia and is number one due to it is in the domain and title.

Whether the url has such a great importance as it did it will have a score and possibly a weighting higher than the term appearing singularly in the body. Google tries to serve the best results so it deems that the weight higher it is a reduced space part( you can't write a page in a url or a domain name etc ).

Whether this is completeley proven the one singlular benefit is the urls are always kept singular so multiple variations of the same page are not referenced due to param misordering/ useless variation.

eg.
/index.php?mode=search&search_txt=mooo
/index.php?search_txt=mooo&mode=search&page=1

when the rewrite rule always has to be
/search/mooo/1/

By writing a class to handle all rewriting it also ends up reducing effort or at worst no impact on the effort of the proccess.

eg.

<?
/**
 * Mock object of data accessing class
 */
class CategoryServer{

function getTitle( $category_id ){
    static $titles = array();

    if(array_key_exists( $category_id, $titles ) ){
        $category_title = $titles[ $category_id ];
        echo 'Got from cache :' . $category_title . '- id=' . $category_id . "\n";
        return $category_title;
    }

    $category_title = '';
    switch( $category_id ){
        case 1 :
            $category_title = 'bananas';
            break;

        case 2 :
            $category_title ='apples';
            break;
    }  

    $titles[ $category_id ] = $category_title; 
    return $category_title;    
}   
}



final class SeoUrl{

public function __construct(){

}

public function createSearch( $search_text, $page=1 ){
    return '/search/' . $search_text . '/' . $page . '/index.htm';
}

public function createHome(){
    return '/';
} 

public function createCategory( $category_id ){
    if( !is_numeric( $category_id ) ){
        throw new Exception("Invalid category id " . $category_id );    
    }

    $CategoryServer = new CategoryServer();
    $category_title = $CategoryServer->getTitle( $category_id );  
    return '/category/' . $category_id . '/' . $category_title . '/index.htm'; 
}

}


$SeoUrl = new SeoUrl();
echo "<pre>\n";
echo $SeoUrl->createHome() . "\n";
echo $SeoUrl->createSearch('muffins'). "\n";
echo $SeoUrl->createCategory(1). "\n";
echo $SeoUrl->createCategory(1). "\n";
echo $SeoUrl->createCategory(2). "\n";
echo "</pre>";
?>

Every time a method has been created to serve a url it makes life easier from that point on.

The benefits of building something that handles returning a title for a section( forced by this process and for a plus with memory cacheing ) it then makes it a doddle to use elsewhere( title, top of section, keywords etc, alt tags), whereever is deemed strategically beneficial. Either way not abstracting the generating of urls always seem easy at first but manually creating every time( remembering what params go in what order etc to keep singularity ) soon becomes tiresome and a waste of energy to me.

I wonder if there's some kind of mathematical analysis of google results we can whip up to statistically correlate a page's ranking on google to the appearance of a search term in its url. Surely this is doable? Seems pretty easy to use cURL in PHP to fetch the first 100 results for a given search term and do some analysis.

You could do that and run it as a cron job to keep track of changes over time. Also keep an eye on who moves up and down.

Weedpacket

As well as hiding the rude mechanicals of your implementation (which no-one cares about and might change in the future anyway),

/search/mooo/1/

is easier to remember/say/write down...

mpb001

As well as hiding the rude mechanicals of your implementation (which no-one cares about and might change in the future anyway),

No-one else ever cares about this unless it costs them money. Which the wrong choice may do( I'd like to say would do always, but penance is never ever paid enough in areas that exclude me ). . I like the term rude in this case(actually mostly I always like the word rude, unless it is barging in queues, then my European sentiment kicks in.). Queues are what Europeans are good at.

I also like pussy cats. New religious war, Puppy Dog Coders versus Kitty Kat Killers.

Who is the better refactorer.

I think the cats agility beats the waterfallnesss of the dog.

NogDog

Don't go bringing dogs into this; at least not in a bad way, or I'll have to sic the original NogDog on you. :p

Weedpacket

mpb001 wrote:
No-one else ever cares about this unless it costs them money.

Probably, but how many sites do you visit charge you money to do so?

mpb001

Weedpacket;10924278 wrote:
Probably, but how many sites do you visit charge you money to do so?

Sorry, I meant the client. Every change request has a cost that is decided by previous choices. They never care about implementation until the developer turns around and says it will cost x000 to the business, then suddenly they may get very interested in implementation to be explained to them in layman terms.