PHP SE bot...

Bannana97

Hi.
I was looking around for PHP Spider/web crawler tutorials and couldn't find anything that worked well for me.
I am creating my own Search engine (the SE part is done) and I need a bot that will go to random websites and allow me to run a mysql code.
Any clues?
Thanks.

dagon

get page, preg_match get url's loop, etc, shouldn't be hard.

Bannana97

Actually, I have no clue how to loop domain names.
That's what I am trying to find out, actually.
I need a code that will loop and find every domain name existing, or something such as that.

Bannana97

But I need the code to get url's loop. I need something to loop every url existing.

dagon

i would use DomXpath myself, can also be done with regular expression.

Imperialoutpost

This is a useful tutorial:

http://www.merchantos.com/makebeta/php/scraping-links-with-php/

mrbaseball34

Be aware that there is some very awful, ill-formed HTML code out there and DOM functions WILL NOT LOAD ill-formed documents.

It is much better to use regex to do this.

Bannana97

Imperialoutpost;10958621 wrote:
This is a useful tutorial:

http://www.merchantos.com/makebeta/php/scraping-links-with-php/

I had a look at that yesterday. It doesn't fit what I need, really. I am looking for something in PHP that is a loop and finds almost every existing domain as possible.

mrbaseball34

Also, remember that most URLs in HTML are relative, not absolute.