Hi all

I am about to embark on building a large e-commerce site which pulls its content from multiple sources. This post is not about SEO and duplicate content but more so the best (or recommended) solution for updating the content on a regular basis.

In short, the data feeds contain categories and products. As you would expect, each product category from each data source is called something different so I am having to create my own web categories and the CMS administrator will have to manually assign the data feed products to the new website categories.

Using the Product Code of the products in all data feeds, this product code will be matched with the website category ID. So when the data import happens again, any existing products will already be matched to my web categories thus leaving only those products (in theory) which have yet to be assigned to a web category.

Phew! OK, so my question is:

What would be the best way to run a daily import of data and update the existing products with the data within the feeds?

Ideally I do not want to FLUSH the products table as this would mean for a split second there would be no products on the website and its also to open for error.

Also, I don't really want to have to update EVERY record EVERY time because we could be talking 10s of thousands of records. If most of them haven't changed in price/description then there would be no need to update them!

Any thoughts would be greatly appreciated.

kbc

P.S. the data feeds are excel documents. So 80's I know!

    What I do for some of my sites is import every product into a buffer table every night then submit the new products and the existing products with a new "last modification date" value to the validation of the site admin. Without a modification date you're screwed though. The buffer table serves a reference purpose when the admin enters a product editing. He can see what description/atributes exists in the live database (often SEOed content) and compare with the current feed content.
    The batch process is run every night via cron during off hours so no one is impacted (on the site I have in mind, the language is French and that limits the timezones interested in accessing the data dramatically).

    Sorry if a bit incoherent, pretty bad case of delayed hay fever slowing my brains to a minimum...

      Write a Reply...