price comparison

codecowboy

Hi,

I want to add a price comparison feature to website so that users can click on one of my products and compare its price to the price of other sellers on the web. Can anyone point me to a good tutorial or article about how to do this? I have no idea where to start.

Thank you 😃

SargeZT

Realistically, there's no real 'easy' way to do this. The sites that do this, such as froogle, are incredibly complex. The only easy way I can see is a link that inputs the data into a site like pricegrabber and shows them those results. I could be missing out on some script out there that does this, but it's certainly no easy task to do yourself.

Pidea

Having just coded my own niche bike components price comparison site, which took over 400 hours to complete, I can give you some basic pointers.

What you see is like the tip of the iceberg, the user interface and search engine are fairly easy to code. The complex part is transfering the data from your spidering process to the website (I use XML files).

Of course you need to transfer only the new, reduced, increased or deleted products for each merchant so your spidering process needs not only to be able to spider merchants sites but needs to be able to recognise new or existing products. If they are existing products, then it needs to check whether the price has increased or decreased.

My spidering process is completely generic and can spider almost any kind of on-line shop once I've created the necessary config files. It's all written in PHP with a MySQL backend and runs on an ancient Compaq 266 running Linux hanging on the end of a broadband connection.

It's early days yet but so far the site appears to be fairly popular. Once the bugs are ironed out I'll be rolling out a whole range of similar sites using the same backend. 400 hours of coding is a lot but the result should be worth it !

codecowboy

Do the other sites have an api that the spider interfaces with? How does the spider know that the product it is looking at is a bike?

Thank you

Pidea

There's no API access - I start by sidering the index page and then following all links from there. In most cases it's easy to tell if a page is a product page or not by its name eg productdetails.asp. If the page is a product page then it gets cached to the local filesystem to be scraped at a later date.

In terms of scraping out the data, most merchants pages are templated so its fairly easy to pull out the product info by pattern matching as long as you can cope with 2-3 different variations to allow for discounted items etc.

None of this is particularly complex but it is fairly tedious - probably 70% of the programming time has gone into coding a really robust spider and scraper process

codecowboy

Does anyone know of a book that discusses developing a price comparison script. Preferrably using php, but I guess it doesn't really matter what language the book uses.

Pidea

There's no books that I'm aware of. I just sat down, worked out what I needed and coded it a bit at a time. 99% is PHP and the rest is a little Perl script.

Here's how the spider works:

It gets the first unread URL from the database and requests the source via CURL. The spider then searches the source of the page looking for internal links only. Any that it finds are checked against the database. If it can't find them, they are added to the database as new URL's to retrieve. The source code of the page is then cached locally ready to be scraped as long as it's a product page.

If the requested URL is not available, or displays some error text eg 'Product not available' then the URL is marked as broken and is removed from the database.

This repeats until all URL's have been read, when this happens all URL's are reset and the process begins again.

Here's how the scraper works:

It gets the first unscraped page and loads it from the local cache, it pulls out the product title and price and then checks this against the values stored in the database. If the price has changed it updates the price in the database and marks the status as reduced or increased.

Since each merchant is different I create a merchant specific config file which describe the product page name format, product title and price format. This allows me to add a new merchant to the spidering process in less than 5 minutes !

Every day I dump out all products that have increased, reduced or have been removed and write these out as XML files. These are then uploaded via FTP to the main site which reads them in to its database.

It's all very modular and surprisingly easy to code. And proving to be very popular so far with more than 1350 unique visitors and some 3000 searches a day with only very little promotion.