many fields vs. association table?

sneakyimp

i am working on an event listing. the basic idea is that a user can enter some information about an event (like a rock show) and, among other information, a user can select up to 5 categories from among approximately 3 dozen categories for their event.

then i need for other users to be able to search the events.

for some data, it's pretty simple. each event has a date, time, and cost.

On the other hand, i'm wondering how to handle the connection of an event to as many as 5 different categories. for instance, if my punk polka band was playing a show, i'd want to check off the rock, punk, polka, nudist, and drug categories, but not any of the other 30 or so categories which did not apply.

it seems silly to me to make an event table with upwards of 50 data fields when most of the 3 dozen categories aren't checked. i'm thinking something like this:

EVENTS
 id
 headline
 subhead
 cost
 description
 city
 state
 zip

EVENT_TYPE
  id
  event_type (e.g., 'polka')

EVENT_TYPE_ASSOC
  id
  event_id
  event_type_id

Now. This is what's getting me worried. Suppose I have a million events, each classified with 5 event types? Performance worries me a bit because a query not only has to search each event description ( a TEXT type field), cost, headline, etc., but also has to
a) perform a great circle distance calculation on the zip code's latitude and longitude to determine which events are with X miles.
b) JOIN the event types to each event and include/exclude that event depending on what event types were selected in the search query

QUESTION 1:
Is there any way to calculate what kind of performance I might get with a million records? I'd hate to start down this path only to realize it isn't feasible when we start scaling.
QUESTION 2:
Which fields should be indexed to assure reasonable performance?

Any help would be much appreciated.

Roger_Ramjet

As a general rule all foriegn key columns should be indexed for performance. The only times this does not apply is when you have a very few records in the table in which case a table scan is quicker. Since your table is going to have lots of records indexing the PK:FK pairs is essential.

Don't worry about the performance with millions of records, that is what relational dbs are designed for. Search these forums and you'll find this Q asked and answered by people who deal in millions of records and terrabytes of data.

On the other hand, your great circle calculations are going to eat up memory cycles if you are doing the full calculation with PI etc, and give you a performance hit. You will need to think about that a bit and benchmark and optimise the queries accordingly. You may even find that it pays to run 2 queries with results from the first stored in a work table for the second query to use, only doing the distance calc once all other criteria have been matched.