l008com wrote:I have a very simple table of addresses:
name, address, city, state are the four columns
There are 120,000 items in this table, and I want to select 30 of them at random. But because there are WAY more Cali and Texas entries than other states, I want a query that will only select unique rows. In other words, I want the state column to be distinct but none of the others. Here is my query as it stands:
It works great but in any given group of 30, I might get 8 california addresses. What I'm trying to get is random groups of 30 addresses, from 30 difference states, every time.
Selecting things randomly is generally not a fast thing to do, and with 120,000 rows to go through, this will definitely NOT be fast. select * from table order by rand() is suitable for small sets only, say 100 to 1000 items max. Anything bigger than that and your database will not be able to give you a result quickly.
If you want good performance, you might want to look at pre-creating your results and then just walking through them with an autoinc field in a dummy table (or use a sequence if you're in pgsql or oracle...)
In this case I'd make one table for each largely populated state, then start grouping the states that are smaller together. Make the entries in random order. Have a column in each table for ordering by. Using a php script or something like it go round robin from one table to the next. Then, select all of that into a summary table one table at a time, and cluster on the id field.
It's a lot of setup, but after that, you can just select a random number +x from the db.
i.e. select * from biggiantmastermutanttable where id between 23401 and 23430 and you're sure to get a random sample, and no two states repeat.
Recreate the master table every few hours / days / weeks / months as needed.