There's no books that I'm aware of. I just sat down, worked out what I needed and coded it a bit at a time. 99% is PHP and the rest is a little Perl script.
Here's how the spider works:
It gets the first unread URL from the database and requests the source via CURL. The spider then searches the source of the page looking for internal links only. Any that it finds are checked against the database. If it can't find them, they are added to the database as new URL's to retrieve. The source code of the page is then cached locally ready to be scraped as long as it's a product page.
If the requested URL is not available, or displays some error text eg 'Product not available' then the URL is marked as broken and is removed from the database.
This repeats until all URL's have been read, when this happens all URL's are reset and the process begins again.
Here's how the scraper works:
It gets the first unscraped page and loads it from the local cache, it pulls out the product title and price and then checks this against the values stored in the database. If the price has changed it updates the price in the database and marks the status as reduced or increased.
Since each merchant is different I create a merchant specific config file which describe the product page name format, product title and price format. This allows me to add a new merchant to the spidering process in less than 5 minutes !
Every day I dump out all products that have increased, reduced or have been removed and write these out as XML files. These are then uploaded via FTP to the main site which reads them in to its database.
It's all very modular and surprisingly easy to code. And proving to be very popular so far with more than 1350 unique visitors and some 3000 searches a day with only very little promotion.