I'm looking at our DB design for products. First off, let me apologize; we've probably discussed similar issues and concepts before. Has anyone noticed that "search" here seems to suck? 😮 I can hardly even look to see our older discussions.

Our product table looks something like this:

id | title | category | seller | description | price | featured_zone_1 | featured_zone_2 | featured_zone_3 | featured_zone_4 | color | speed | metercount | condition | engine

And, actually, that's not the whole thing. There are 138(!) columns. Now, I didn't design it, but I'm thinking to try and improve it. What sort of normalization would be optimal? I've been looking at making some tables like:

create table brands(id int primary key, brand varchar(80));
create table colors(id int primary key, color varchar(80));
create table prices(id int primary key, price float);

So I'm thinking that this points towards 6th Normal Form. But then there will be a huge amount of JOINs necessary to create a Product object, right?

select p.id, p.title, p.seller, p.category, br.brand, co.color, pr.price 
from products p 
left join
br.brands on p.id = br.id
left join
co.colors on p.id = co.id
left join
pr.prices on p.id = pr.id

But I wonder if all I'm doing is a] moving complexity from the design into the query logic, b] increasing the load on MySQL with all these JOINs ...

Any thoughts?

    A few random thoughts:

    I feel like price can remain, since it can be defined as the desired numeric type. Maybe the same for metercount and/or speed?

    The featured_zone_n columns certainly smell like a many-to-one relationship that might be better handled by a separate table.

    It may make sense to move things like color, condition, and engine to separate tables to ensure the ability to join to all items that have color.key, versus hoping they all spell it "blue" and not "bleu". 🙂

    NogDog The featured_zone_n columns certainly smell like a many-to-one relationship that might be handled better by a separate table.

    I had wondered if that would be better served by an integer value that tells what zone and/or combination of zones were 'featured'. But again that sounds like moving logic OUT of the DB. Of course, I'm not sure that logic in the DB is the best way...

    NogDog It may make sense to move things like color, condition, and engine to separate tables to ensure the ability to join to all items that have color.key, versus hoping they all spell it "blue" and not "bleu".

    Can you clarify? Do you mean have a table of colors, like this?

    mysql> describe color;
    | id | color |
    ==============
    | 1  | blue  |
    | 2  | red   |
    
    mysql> select id,title,color from products limit 2;
    | 1234 | Chevy Spark | 1 |
    | 1235 | Ford Focus  | 2 |

    Or more like this?

    mysql> select * from colors;
    |1234 | blue |
    |1235 | red  |

    Or do you mean something else entirely?

    For the color example, yeah, I was thinking of a table for color names. Then if you wanted all blue cars, it might be something like:

    select products.*
    from products
    inner join color on products.color = color.id
    where color.color = 'blue';
    

    dalecosp I had wondered if that would be better served by an integer value that tells what zone and/or combination of zones were 'featured'. But again that sounds like moving logic OUT of the DB...

    Similarly, you could have a zone table, but then also a zone_to_product table for the many-to-one relationship. Depending on what you need to do with them, it might be a query similar to above where you look for products associated with a specific zone, though you'd possibly join from products to the zone_to_products table and from there to the zone table.

    select products.*
    from products
    inner join zone_to_product on zone_to_products.product_id = products.id
    inner join zone on zone_to_product.zone_id = zone.id
    where zone.something = 'some value';
    

      It seems a fairly complex issue all-round. Certainly searching and filtering are considerations. What about the overhead of creating a Product (currently 36 attributes in 3 tables) or a miniProduct (13 attributes in 3 tables)?

                  "select c.live,c.title,c.seller,c.date,c.ends,c.description,
                  c.precurrency,c.postcurrency,c.price,c.sold_displayed,c.image,
                  ct.category_name, (Nineteen other fields omitted here) 
                  c.exposed_email,c.payment_option,c.is_import,m.make,
                  c.store_category from {$this->table} c inner join categories
                  ct on c.category = ct.category_id left join makes m on c.id = m.id
                  where c.id = $id and c.start_time <= UNIX_TIMESTAMP();";

      If I extend this concept to enough new tables I'd have 36 attributes and possibly a couple dozen other tables to JOIN with ...

        I don't know for sure how much it affects performance, but I work every day with an app for medical provider reviews, and there are generally around 5-7 tables joined in many of the queries to get review data, sometimes more. We regularly handle something like 1000 requests per minute in peak usage periods, many requiring multiple queries (though some may be "simple"). I think as long as tables are indexed appropriately, it shouldn't matter. EXPLAIN ANALYZE can be your helpful assistant. 🙂

          The separate tables for price, color etc... I'm not that keen on. You won't save any columns in the original table because you'll still need a column there for a foreign key. And then you'd need a join to find out what the otherwise meaningless numbers in that column mean.

          For constraining the legal values of a column (where "blue" is a colour and "bleu" is not) it would be better to create a domain to list them, likely an ENUM. However, that assumes you know all the possible values in advance: for prices it would be a nonnegative decimal amount (not float!), but for colours—"bleu" isn't in the list so they can't use that, but maybe "beige" or "maroon" aren't either.

          Are there any columns that have a tendency to have lots of NULLs or N/A values? Those could go out into their own tables that have records for only those products that actually have those properties. (If for any given row columns 67-73, say, are all NULL at the same time because they either all apply or none of them do, then create a table for just those columns.) It's a 1:{0,1} relationship: the new table's PK is itself an FK reference to the original product table, which could be reconstructed if need be with a LEFT JOIN.

          Like NogDog, I'm really looking at those featured_zone_n columns with deep suspicion. I don't know what types or functionality they have, so alternate representations are only speculative. But as well as what's already been questioned about them what I wonder is if those really properties of the product? I can look over a car and tell you what sort of engine it has, and what colour it is, but would I be able to figure out its "featured zone 3"? I'm only guessing of course but from their names they sound more like something to do with a product listing. Maybe there are other columns in those 138 that are similar (such asdescription).

            I've never been a big fan of enumeration column types, if only because it seems more annoying to deal with when you need to add/delete/modify options* -- and maybe that's just my lack of experience using them? (Maybe it's a pretty short, clearly defined set of colors, so no big deal?) On the other hand, I guess you could just leave it as plain text and put the onus on the app to control the options, possibly via some config file it uses. 🤷
            ______________
            * Not crazy about giving alter table permissions to an application admin UI, e.g.

              Yes, it's all about product listings. And I don't have separate objects for products and product listings ... that might take a bit to wrap my head around; it's certainly not the way people think here (we sell advertising, not products, so the Listing is actually the thing). It does have description, image, image_url, date, end_date, a boolean flag for on/off display (when it's end_date is past), etc. but only a few of the fields are required

              Plus, it's all already working, or mostly working, or kind of working okay ... so the reasons for making improvements would be for code maintainability, easier extendability/cleaner code in thefuture, and easier understanding by the next guy if I got hit by a bus or got a better job offer, etc.

              Since not many Listings get featured, that might actually be a place to zero in. Of course, since "features" appear in multiple places on the site (which is why there are multiple columns, more/less ... I didn't make that decision), there's quite a bit of code to alter to keep that working if we change the DB.

              Images would be another place, perhaps, with a ; there are plenty of Listings that have no image. But, from that standpoint, there are quite a few that have no particular color, either, I think. Not even every listing has a price, which is probably unusual from most standpoints (quite of the items are negotiable in price and there's not even a starting price).

              But to get back to the original questions:

              1. How worried should I be about many tables to replace One Big Table (and, therefore, many JOIN statements)?
              2. Generally speaking, how much logic should actually be done by the Query/DB engine and how much in PHP?
                dalecosp wrote:

                Not even every listing has a price

                That's what NULL is for though. I think the issue for price is whether there are listings with more than one price such that you're duplicating the same listing just to vary the price.

                dalecosp wrote:

                How worried should I be about many tables to replace One Big Table (and, therefore, many JOIN statements)?

                Not too worried thanks to indices on foreign keys.

                dalecosp wrote:

                Generally speaking, how much logic should actually be done by the Query/DB engine and how much in PHP?

                Pick the low hanging fruit: only select the columns you need (and hence only the joins you need); use indices to get (and enforce) unique entries, to filter stuff quickly, or to sort; do simple aggregate queries. For OLTP I think that when the queries start to become complex, it's good to take a step back and ask if you're doing too much in the database. (This is probably true for OLAP too, but then for those you might have queries directly linked to an analytical dashboard rather than passing through an application layer.)

                NogDog
                So MySQL implements enums on a per-column basis as sugar over integers instead of as a distinct type in their own right? Objections 6 and 7 say as much. That alone would be enough to rule them out.

                  laserlight For OLTP I think that when the queries start to become complex, it's good to take a step back and ask if you're doing too much in the database.

                  You don't want to be pulling the search/filter aspects of a task out of the database; those things are part of why you have a DBMS in the first place. If that's getting complex it might simply be because the model itself is either complex or not well-suited for the application.

                  If you have to go through massive contortions of string parsing just to implement an "ORDER BY" clause then it's an instance of the latter (I've seen dates stored as VARCHAR(16)—yes, as "5 August 89"—and no, it wasn't a format the dbms recognised).

                  In the former case, it might help to make simplified projections of the model for particular purposes. If your complex queries include the same bunches of joins over and over then it's a sign that (at least part of) the application has a particular viewpoint about what the model is about. So make some views for the application. If the "real" table records can be identified from their appearance in the view, it might even be updatable.

                  Ooh, views. I've actually used that approach for some things in the past. It might be of some value to an object or two ....

                    Write a Reply...