im not really aware of any tutorials out there myself, but i know you will need to use regular expressions ([man]prce[/man]) to extract url information out of the pages. you will need some sort of array to store a stack of urls to spider, and then another array of urls already spidered so you can ensure you dont read the same page twice. [man]array_search[/man] should be quite efficient, however, i have never actually looked at the code for that to see what search algorithm they use. you might want to implement a binary search yourself since it is really fast.
good luck