I was skeptical of johanafm's assertions as it runs contrary to my prior experiments. so I did some benchmarking.
I took a customers table with 37638 records from an oscommerce database. It has numerous fields in it so it represents realistic data. I added 5 flag fields (of type tinyint[1]) and randomly set 1 in 10 of these flags to "1".
I also created an assoc table:
CREATE TABLE `foo_assoc` (
`customers_id` int(11) unsigned NOT NULL,
`flag_number` tinyint(4) unsigned NOT NULL,
KEY `customers_id` (`customers_id`,`flag_number`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
I created two separate indexes -- one on each field. If one of the flags was set in the 'foo' table then I also added a record to this table so they essentially encode the same information. If flag3 is "1" in foo, then I insert a record with flag_number=3 in foo_assoc.
NOTE: I repeatedly ran this to clear the query cache.
RESET QUERY CACHE;
I ran these queries
SELECT FROM foo WHERE flag1=1 (0.0030 sec)
SELECT FROM foo WHERE flag2=1 (0.0030 sec)
SELECT FROM foo WHERE flag3=1 (0.0031 sec)
SELECT FROM foo WHERE flag4=1 (0.0029 sec)
SELECT * FROM foo WHERE flag5=1 (0.0030 sec)
When I added indexes on each individual flag field, the times improved. Indexes do apparently help.
SELECT FROM foo WHERE flag1=1 (0.0008 sec)
SELECT FROM foo WHERE flag2=1 (0.0008 sec)
SELECT FROM foo WHERE flag3=1 (0.0008 sec)
SELECT FROM foo WHERE flag4=1 (0.0008 sec)
SELECT * FROM foo WHERE flag5=1 (0.0008 sec)
I then tried queries using the assoc table:
SELECT f. FROM foo f INNER JOIN foo_assoc fa ON fa.customers_id=f.customers_id AND fa.flag_number=1; (0.0010 sec)
SELECT f. FROM foo f INNER JOIN foo_assoc fa ON fa.customers_id=f.customers_id AND fa.flag_number=2; (0.0010 sec)
SELECT f. FROM foo f INNER JOIN foo_assoc fa ON fa.customers_id=f.customers_id AND fa.flag_number=3; (0.0010 sec)
SELECT f. FROM foo f INNER JOIN foo_assoc fa ON fa.customers_id=f.customers_id AND fa.flag_number=4; (0.0010 sec)
SELECT f.* FROM foo f INNER JOIN foo_assoc fa ON fa.customers_id=f.customers_id AND fa.flag_number=5; (0.0010 sec)
I also tried these:
SELECT customers_id FROM foo WHERE flag1=1 (0.0007 sec)
SELECT customers_id FROM foo_assoc WHERE flag_number=1 (0.0003 sec)
SELECT f.customers_id FROM foo f INNER JOIN foo_assoc fa ON fa.customers_id=f.customers_id AND fa.flag_number=1; (0.0005 sec)
I think it's worth noting that the assoc table approach works a bit faster if you are only grabbing a teeny bit of data. I tried setting 50% of the flags to 1 and then 90% of the flags to 1 and the results are pretty much the same.
I tried the same thing with a bigger table that has more fields. It was the orders table with 308991 records and it takes up about 100 MB on disk including the indexes I've added.
SELECT FROM bar WHERE flag1=1 (0.0013 sec)
SELECT b. FROM bar b INNER JOIN bar_assoc ba ON ba.orders_id=b.orders_id AND ba.flag_number=1; (0.0016 sec)
SELECT orders_id FROM bar WHERE flag1=1 (0.0007 sec)
SELECT b.orders_id FROM bar b INNER JOIN bar_assoc ba ON ba.orders_id=b.orders_id AND ba.flag_number=1; (0.0006 sec)
These timings really don't change much if 10% of the flags are set or 90% of them.
SUMMARY: For the smallish table sizes we're talking about here, Johanafm is right. I believe (and could be totally wrong) that when we're talking about much bigger db tables (say 1GB or 10Gš, the flag approach will cause problems because a disk drive will not be able to cough up the entire table fast enough to allow inspection of a flag column which it must extract from all the other column values in a given database record.
It's also been my experience that any query which requires a function to be performed on each record kills performance because it precludes the use of an index. In other words, this query can't use the index for some_indexed_field because it's checking the results of a function and cannot predict the outcome based on the index contents.
SELECT field1, field2 FROM some_table WHERE SOME_FUNCTION(some_indexed_field) = 1;
There are other considerations as well. With an association table, you might have to perform extra JOINs or extra queries to maintain the extra table. With flag fields in the original table, you might have to continue changing your table definition as you add fields or you may run out of bits if you use the mask approach.
This is probably more info then PHPycho ever wanted.