Hi guys,
I need some help designing this DB and constructing an SQL query for my problem. I feel like this may have already been answered, but I have searched the forums and didn't have any luck. So any help or pointers to previous posts would be greatly appreciated!
I'm trying to create a "filter by attributes" feature for an ecommerce site. A visitor first drills down into a particular category, then has the option to filter the product listings. Here's how my tables are setup:
Categories:
id, name
------------------
1, TVs
2, Cameras
UNIQUE INDEX ON id
Products:
id, category, name, bunch of other fields...
-------------------------------
10, 1, "Sharp LCD", ...
11, 1, "Sony LCD", ...
12, 1, "Samsung LCD", ...
13, 2, "Canon S500", ...
14, 2, "Nikon D60", ...
UNIQUE INDEX ON id
Attributes:
prod_id, cat_id, name, value
-------------------------------------
10, 1, "Size", "42\""
10, 1, "Color", "Black"
11, 1, "Size", "42\""
11, 1, "Color", "Silver"
12, 1, "Size", "50\""
12, 1, "Color", "Silver"
INDEX ON prod_id, INDEX ON cat_id
I have the first part figured out. If someone selects a 42", Silver TV, then I'm using inner joins on the attributes table to filter the products list. The speed of the query is fast so I'm relatively happy with that.
The problem I'm having is listing the unique attribute name/value pairs along with a count of all items. Something like this:
+ Size
- 42" (2)
- 50" (1)
+ Color
- Black (1)
- Silver (2)
Here's the query I'm using currently:
SELECT name, value, COUNT(*) AS num FROM attributes AS a1 WHERE a1.category = 1 GROUP BY name, value ORDER BY name ASC
For a small DB, this query runs fast, but my database has 40,000+ products, and on average 1000 products per category and 10 attributes per product. The speed of the above query is kinda slow (takes 3/10s of a second on a 2Ghz AMD Opetron processor if that gives you any reference... sorry I don't know how to properly benchmark or give a reference point).
The second problem I'm having is showing the above list once someone has already set one filter. Say they've selected a Size, now it should list color as an attribute along with the counts for the resulting subset:
+ Size: 42"
+ Color:
- Silver (1)
- Black (1)
I'm not quite sure how to do this, but I'm thinking of adding a "WHERE prod_id IN (...)" clause to the above query, the "..." being replaced with the query that selects all products based on the previous filters.
I'm wondering if there is a better query, a better way to setup the tables, or a better way to setup the indices. The filter by attributes is a common feature you see in most ecommerce sites big and small, so I imagine they must be doing something different for it to work fast and serve many users at the same time. Any thoughts?
Thanks in advance for any help on this!
Sridhar