About Spider/Web Crawler :
From my knowledge .... spider doesn't grab dynamic page .... it only grab static page.
static page :
www.xyz.com/what.html
dynamic page :
www.xyz.com/what.php?what=value
(parameters is transfered with GET method)
or
www.xyz.com/what.php
(parameters is transfered with POST method : what = value)
Some developer convert the dynamic page to static page so spider can grab it .... see this topic url
!! it shows :
http://www.somewhere.com/Q_11427939.HTM
Actually, that is dynamic pages but Apache transform it into static :
in apache setting (maybe) :
RewriteRule /Q_(.*).HTM /question.cgi?q_id=$1
this is the explaination :
if a user connect with
http://www.somewhere.com/Q_11427939.HTM
the apache will transform it into
http://www.somewhere.com/question.cgi?q_id=11427939
So spider will think that Q_11427939.HTM is a static ... but the fact is the 12345 is parameter ...
Another example :
ARTICLE_MAR2000.HTM
it will redirect to :
article.php?season=MAR2000
so the article.php will look for a directory named /ARTICLES/MAR2000 and retrieve a html page and send it back to user ...
Usually, spider look for META TAG to determine what site is that. This is what I get from www.experts-exchange.com
<meta name="description" content=" Free answers, help, and advice from our experts. Ask your questions
here ... ">
<meta name="keywords" content=" Experts Exchange, expert, technical, support, assistance, inquiry, experts-exchange,
expertsexchange">
So if you want your page to be caught by spider, make sure it's not dynamic, and put META tag in the
code ...
Some web crawler can crawl dynamic pages. But most web crawler is made not by third party since make it yourself won't take so hard, so there's no standard about web crawler. But crawling dynamic pages is risky, and resources wasting.
Thing to be considered :
1. Dynamic pages support cookies/session whereas ordinary web crawler doesn't, unless the developer make it so.
2. Dynamic pages may use SSL
3. Dynamic pages may require mandatory param=value (GET or POST method)
Ok ?
LexZEUS,
http://lexzeus.tripod.com