How's it going?
I'm trying to index (robot/spider) a site in php for our search engine.
However, when I cut the links, some of the links aren't complete enough and therefore don't work.
e.g. at yahoo if I cut all links some are complete (and thus work immediately) but others are of the form:
<a href=r/gr>Greetings</a>
...which results in a deadlink unless the http://google.yahoo.com/ is added to the front.
Is there a way around this please, that ensures I can capture the entire active link each time please?
I'm looking for a full proof generic way so there's no problems following any link at any site dynamically (i.e. without having to configure the script for each site individually).
Maybe php isn't the language to use for a robot?
Thank you very much if you can assist...
Regards,
Jason