Hi,
I've received a lot of great advice on normalization here at PHPB and Google, but I'm to the point where I've read so much that I'm afraid to budge without hearing from somebody with knowledge on the subject. (WeedP and Shrike, I'm certainly not referring to you two, you've been a great help and I feel guilty for asking one more minute of your time)
Here's the problem: For as many posts that I found saying how important it is to seperate your data and eliminate redundant info, I found just as many that warn of the perils of over-normalizing. For as many articles I've read saying how sleek it is to break a table down to only referenced numbers for speedy queries; it's backed with just as many warning that pulling from too many tables at once can be a disaster. I'm now gun-shy because the table that I am setting out to normalize has about 30 columns and it seems just as logical to leave it nearly intact as it does to break it into 30 tables plus.
I'm hoping that somebody can help direct me to a happy medium here. It's going to be a huge overhaul and will require the recoding of hundreds of pages so I am trying to get it as close to "standard" as possible. My similar post was not responded to twice in the db thread so this is my last desperate attempt. If there was anything left to study, I would. ANY advice is much appreciated!
Here is a shortened version of my table called songs with about 1/3 the actual columns. Helping me on this one table should give me all the ammo I need to take on the rest of the site.
songid auto_inc, pri key
bandid
song_title
primary_genre
secondary_genre
submit_date
exclusive_track
song_desc
lyrics
weekly_plays
monthly_plays
total_plays
daily_winner
weekly_winner
monthly_winner
I've taken Shrikes advice and created a new table called "genres" with two fields:
[B]genres[/B]
Genid | Genre
1 Acoustic
2 Alternative...
It was suggested that I also add another table called song_genres which stores each songid and corresponding genids. It looks like this:
[B]song_genres[/B]
songid | pri_gen | sec_gen
7761 3 5
1123 28 4
The songs table is queried for a variety of items throughout the site, and in most cases it will only display the song_title (as a link to a dynamic page with songid as reference), the primary_genre, the secondary_genre, and the submit_date. Once the link is clicked it takes you to the actual song page where 90% of the data is pulled at once. I realized that most columns were only called upon when on this page so I decided to create a table called "song_info" which stores songpage-specific items like:
[B]song_info[/B]
songid | lyrics | song_desc | mature_content | downloadable .....
It also makes sense to have seperate tables for each stat group (plays, downloads, and score; each having weekly, monthly, and total results)
[B]song_plays[/B]
songid | weekly_plays | monthly_plays | total_plays
[B]song_score[/B]
.....
[B]song_downloads[/B]
.....
Here comes the barrage; very sorry. If anybody could please shed some light on any of these crucial Q's for me I'd really appreciate it.
Am I already over-normalizing?
Considering that most queries only ask for songid, bandid, song_title, pri_gen, sec_gen, and song_date would you consolidate those items? I do like the idea of having a table full of only numbers so would it be a good idea to add a few fields to the song_genres table and use it as a song_ref table such as:
[B]song_ref[/B]
songid | bandid | pri_gen | sec_gen | submit_date
1234 555 3 9 2005-12-02
Would you have included submit_date to this table? Does it fall into the same "speed in numbers theme" or would you advise seperating it? How about the song_title field? Would it benefit me to include the VARCHAR(50) song_title field into this "master chart" and eliminate the need to query a seperate table? Often times that the song_ref table is queried it will require additional queries to retreive items like Band Name, monthly_plays, etc; so I may or may not be leaning towards over-normalization already. I saw samples of structures similar to mine that would use a table for song_titles/songids. I could easily do the same and apply that to lyrics and song descriptions,..but why? Someone pointed out that the number of columns in a table isn't a factor in queries because non-referenced fields are ignored. That being said; what have I to gain that justifies 10,000 repeated songid's and an extra table to query? I had assumed that "SELECT song_title FROM table WHERE songid=1234" would run faster on a table that only contained those two fields and wasn't cluttered up with thousands of TEXT fields; but if the extra fields aren't in the equation I'm looking at querying 10,000 rows regardless. The only reasoning I can see is that I could then create a table giving all song titles an ID so that on the rare occasion two bands have the same name for a song; I could then reference the songTitleID and save a byte or two of redundancy. It's safe to say that the redundant songid's would far surpass what I'd gain.
What should I do with tinyint columns that are used as booleans in occasional queries? (i.e. "WHERE pri_gen = Rock and exclusive_track=1") It seems logical to put them in the "master chart" since they are so tiny and would eliminate the need to query an extra table. They may only be used on 5% of the queries so is it worth storing them in the table that will be getting all the action?
Can anybody PLEASE tell me how they would go about breaking that table down and if possible; explain why?
Thank you very much for any guidance.