Looking for spider tutorial.

bigdaddysheikh

Hello,

I have searched on google and yahoo and can not find any tutorials on how I can make a spider to crawl sites and get information. I have some sort of logic on it. I know i have to use fopen adn read the file but i dont know how to get certain bits of information etc.. If anyone knows where i can find a tutorial please let me know.

drew010

im not really aware of any tutorials out there myself, but i know you will need to use regular expressions ([man]prce[/man]) to extract url information out of the pages. you will need some sort of array to store a stack of urls to spider, and then another array of urls already spidered so you can ensure you dont read the same page twice. [man]array_search[/man] should be quite efficient, however, i have never actually looked at the code for that to see what search algorithm they use. you might want to implement a binary search yourself since it is really fast.
good luck

bigdaddysheikh

I was thinking of using preg_match I dont know if that would work. I am still looking for a tutorial.