design choice: completely relational, or create new tables?

pdaoust

Hi there. I was wondering if I could get veteran database designers' opinions on a design choice I have to make. See, I'm writing a piece of software using PHP and a SQL database (MySQL for now, hopefully Firebird in the future) that allows museums, archivists, collectors, etc catalogue their data. The user will be able to create their own tables any way they like, with whatever fields. I guess it's similar to FileMaker or some other visual database creation tool.

Anyway, the big question is, what would be the best way to design this program, out of the two options below?

Option 1
when the user creates a new 'collection' (it'll be all pretty and wizard-based), the program creates a new table in the SQL database, and adds records into the following tables:

Collection, which holds the name of the collection, along with its description, datestamp, and a reference to a few special fields: the name of the 'catalogue number' field, the name of the 'title' field (all collections are required to have at least those two fields, although their names and data types can be anything), and the name of the 'image' field, for those collections that involve image thumbnails.

Field, which has one record for each of the fields created in the new database. Each record has the collection ID, the field name, its location on the form, and its data type (the data types I've created are text, longText, number, image, enum, set, link, boolean, and date). This way I can pull the collection's table structure directly from this field, instead of guessing at it by dumping the table structure. You can see I have some specialised data types that don't exist in MySQL as well, like 'image'.

This is the way the program is written now, and it seems to work okay.

Option 2
Have the following tables:

Collection, with name, id, etc, as above
Field, again with the same as above
Record, just has its own id and collection ID
DataItem, which holds the actual data. It'd have the record ID, the field ID, and the actual data for the field.

I hope all this makes sense!

The first option strikes me as being the fastest and most efficient, and has the added option of making the tables readable by humans. The second option has a mess of relations and foreign keys, but seems the most database-design-pure option.

anyway, if anyone can understand this, and has some comments, I'd love to hear them.

Thanks!
Paul d'Aoust

superwormy

You should very rarely ever have to create a new table for just an individual user. I'd go with the second option. It'll save you trouble down the road when you don't have to maintain 30,000 tables.

travelbuff

I would definitely go with a fully normalized database design, as in option #2.

Though it seems more complicated (and slower), there are many benefits to this. For example, it makes your data much more scaleable and much more flexible.

Let's say down the road you decide that you need to have more data fields, such as history or prices. With a normalized db, you can make relatively major overhauls to the way you are collecting data without breaking the existing application.

On my first major project, I decided to take a few "shortcuts" and not fully normalize (and they seemed like small shortcuts). The app worked great, but later on I decided to extend it's functionality and the shortcuts really came back to hurt me....took me hours to redesign the db and then write scripts to move everything where it should have been to begin with....and if you have ever worked with what seems like a great app until you get in and find out it's very limited due to poor db design, you won't ever give full normalization another thought!

BTW, there is a great free app called dbdesigner that would allow you to graphically model your db structure...I have found it to be really helpful: http://www.fabforce.net/dbdesigner4/

HTH

pdaoust

so I should go with a completely normalised database... hokay! sounds like fun..... yikes, I hafta redesign the whole program.

But thanks so much for the opinions. Actually, there is an added benefit to having it fully normalised that I didn't mention in my original post: one datatype is a 'dbLink', which simply references another record. Using dbLink to link to another record in the same collection would be easy; just reference its primary key. But linking to a record in a different collection would be hard, because you'd have to use the collection ID and the record ID. Well, with a fully normalised database, you wouldn't need the collection ID at all!

That DBDesigner program looks great. I see it also creates the database once you've finished designing it... very spiffy. And it's available for Linux too! (I haven't used Windows since my hard drive crashed a few weeks ago, and I hardly ever used it before that.)