Need help with grouping SQL statements

lobobr

Greetings.

I'm in desperate help with this problem:

I have a mySQL table called USERS, wich have a column called STATE (geographical).

I need to group USERS by STATE and then put them into another table called GROUPS.

For an example:

USERS

ID-------------Name---------State
1--------------John-----------CA
2--------------Paul------------NY
3--------------Ringo----------NY
4--------------George--------CA

So the Groups table (obviously both fields will be PK) will have:

IDGROUP------IDUSER
1----------------1
1----------------4
2----------------2
2----------------3

Please HELP.

Thanks in advance

bastien

But you would be duplicating information, what is the purpose of this?

If you have another table called states with the STATEID = to the GROUPID from you example, then you could update the original table...

lobobr

Please I really need help on this.

It doesn't matter if I'll get duplicates, believe me. I know it's wrong, but for now I only need this. By knowing how to do this I can make it right later.

Please help.

bastien

do you have another table with the values sitting by themselves (ie the StateID and the StateName)?

And does the version of mySQL that you have support nested (or sub-) queries?

nemonoman

I agree that based on your question, I dont see a need for a groups table that relates userids to groups, per se.

You need a table that puts STATES into groups

StateGroup

State Group GroupName
CA 1
NY 2
NJ 2
CT 2
PA 2
etc.

SELECT COUNT(*), MAX(id), MIN(id)
FROM users, StateGroup
WHERE user.State=StateGroup.State
GROUP BY StateGroup.Group

tsinka

Hi,

if you want to group users by state you could also do the folowing:

You already have a table USER.

I would suggest to change the database layout to have a table called STATE that would reduce redundancies (like having the column state in USERS):

STATE
ID------------STATENAME
1-------------CA
2-------------NY
.....

USERS
ID-------------Name
1--------------John
2--------------Paul
3--------------Ringo
4--------------George

The third table would be a mapping or grouping table:

STATE2USERS
STATEID-------USERID
1----------------1
1----------------4
2----------------2
2----------------3

If you want to make sure that no STATEID/USERID tuple can be inserted ino STATE2USERS twice you can create a unique/primary index using both fields.

The query could be:

SELECT USERS.Name FROM STATE,STATE2USERS,USERS WHERE STATE.ID=1 AND STATE.ID=STATE2USERS.STATEID AND STATE2USERS.USERID=USERS.ID

That would return all users of CA

Thomas

nemonoman

I must disagree and somewhat vehemently: Your proposed table STATE is practically nuts, and your proposed table State2Users is entirely nuts.

About your proposed table STATE where you provide numeric ids for States.

Since all you are doing with this table is aliasing a relatively unchanging identifier (ie. NY will be New York FOREVER), it makes no sense to alias NY with a numeric identifier like 1.

There are occasions where an aliasing table makes sense: NOT for state abbreviations, however.

NOW THE REALLY NUTTY TABLE State2Users:

Each User has as an attribute always 1 and ONLY 1 state. Thus State is properly an attribute that belongs in the User Table, not in a separate table.

Why would you want the simplest query to require 2 joins?

State Abbreviation, a 2 char identifier, uses no more diskspace than a numeric identifier. Every User has as an attribute always 1 and ONLY 1 state.

Your reasoning is typical of the overkill from someone honeymooning with database normalization. You are over-normalizing your data, and must learn not to do this.

Consider the nuttiness of what you are proposing:

Based on original table:

SELECT name
FROM users
WHERE state='NY'

vs based on your Crazy tables

SELECT user.name
FROM user,
states,
state2user
WHERE user.id=state2user.stateid
AND state2user.stateid=state.id
AND state.statename='NY'

You could make a theoretically reasonable but practically insane suggestion, however by normalizing "Name" out of the user table

Name

ID Text
1 George
2 John
3 Paul
4 Ringo

And change USERS to

ID NameID StateID
1 2 1
2 3 2
3 4 2
4 1 1

Thus you would not replicate the data in "Name". You could reuse "John", "Paul", "George", etc.

This IS theoretically normal, but practically, nuts. The benefit of this sort of abstraction appears only when you want make one change that affects all records of the type. With one change, you could change all Georges to Georginas.

(But that sort of change is practically NOT going to happen. All the Georges are NOT going to change into Georginas.)

If we added THIS layer of normalization to your over-normalized states tables, we could change this simple query:

SELECT name
FROM users
WHERE state='NY'

To this jawbreaker:

SELECT name.text
FROM user,
states,
state2user,
name
WHERE user.id=state2user.stateid
AND state2user.stateid=state.id
AND name.id=user.nameid
AND state.statename='NY'

When you consider that instead of a 3 column table, we now have a 4-table JOIN on 4 tables with a total of 9 columns, you've got to wonder whether progress has actually been achieved.

If you really, REALLY wanted to get normalized, you could add a many-to-many resolving table or two, and handle first/middle/last name combinations, like:

NameOrderType

ID Type
1 First
2 Middle
3 Last

Then You would take the Name ID from the user table and place it in a resolving table like:

UserNameOrder

Userid NameID NameOrderTypeID
1 2 1
2 3 1

etc.

AND NOW instead of:

SELECT name
FROM users
WHERE state='NY'

We have achieved this MUCH, MUCH more interesting query:

SELECT name.text
FROM user,
states,
state2user,
name,
usernameorder,
nameordertype
WHERE user.id=state2user.stateid
AND state2user.stateid=state.id
AND name.id=user.nameid
AND usernameorder.userid=user.id
AND usernameordertypeid=nameordertype.id
and nameordertype.type='First'
AND state.statename='NY'

Although this last query APPEARS much more complicated, I think you will agree that the efficiency of simply updatng a single record in the event that 'NY' ever changes from 'NY' to 'XZ', or if all the people with the name 'George' as first or middle or last names decide to change their names to 'Georgina' en masse far outweighs the difficulites.

Also, if you are a professional database guru, you can spend many many billable hours optimizing the indexes on these tables, analyzing for errors (because the average programmer will have caretsian products out the wazoo), and generally getting rich by obfuscation.