i am working on an event listing. the basic idea is that a user can enter some information about an event (like a rock show) and, among other information, a user can select up to 5 categories from among approximately 3 dozen categories for their event.
then i need for other users to be able to search the events.
for some data, it's pretty simple. each event has a date, time, and cost.
On the other hand, i'm wondering how to handle the connection of an event to as many as 5 different categories. for instance, if my punk polka band was playing a show, i'd want to check off the rock, punk, polka, nudist, and drug categories, but not any of the other 30 or so categories which did not apply.
it seems silly to me to make an event table with upwards of 50 data fields when most of the 3 dozen categories aren't checked. i'm thinking something like this:
EVENTS
id
headline
subhead
cost
description
city
state
zip
EVENT_TYPE
id
event_type (e.g., 'polka')
EVENT_TYPE_ASSOC
id
event_id
event_type_id
Now. This is what's getting me worried. Suppose I have a million events, each classified with 5 event types? Performance worries me a bit because a query not only has to search each event description ( a TEXT type field), cost, headline, etc., but also has to
a) perform a great circle distance calculation on the zip code's latitude and longitude to determine which events are with X miles.
b) JOIN the event types to each event and include/exclude that event depending on what event types were selected in the search query
QUESTION 1:
Is there any way to calculate what kind of performance I might get with a million records? I'd hate to start down this path only to realize it isn't feasible when we start scaling.
QUESTION 2:
Which fields should be indexed to assure reasonable performance?
Any help would be much appreciated.