The group by clause was in there to sort of cheat...An event might be associated with several different records in the eta table and I only want to show each event once, regardless of how many occurrences it might have.
I really appreciate your help, but I tried your query and it typically takes about 3 times as long as my original query. ๐ When I tried EXPLAIN on your query:
EXPLAIN SELECT e.id, e.title, e.subheading, e.zip, eta.start_timestamp, eta.end_timestamp, z.city, z.st
FROM demo_events e
INNER JOIN demo_event_time_assoc eta ON e.id = eta.event_id
INNER JOIN demo_zip_codes z ON e.zip = z.zip
WHERE e.active =1
AND (
(
eta.start_timestamp >=1162506800
AND eta.start_timestamp >=1163111600
)
OR (
eta.start_timestamp <1162506800
AND eta.end_timestamp >=1162510400
)
)
ORDER BY RAND( )
LIMIT 10
I get essentially the same results as with my query, only with more rows under consideration:
table type possible_keys key key_len ref rows Extra
eta range event_id,start_timestamp,end_timestamp start_timestamp 4 NULL 54522 Using where; Using temporary; Using filesort
e eq_ref PRIMARY,active PRIMARY 4 eta.event_id 1 Using where
z eq_ref PRIMARY PRIMARY 5 e.zip 1
As for the data, here is some info on the tables being queried:
demo_events: 32,354 records; indexed on id and zip
demo_event_time_assoc: 75,325 records. indexes on id, event_id, start_timestamp, end_timestamp
demo_zip_codes: 43,104 records, indexed on zip.
Interestingly, if I remove the 'order by rand()' bit, your query runs in 0.0014 seconds. If I remove the ORDER BY and GROUP BY bits of my query, it runs in 0.0071 seconds.
I tried an alternate method of choosing records at random from this set which entailed:
1) run first query to fetch all event ids that match the time critieria
2) take the $count of records returned and choose ten random numbers between 1 and $count
3) loop thru all the ids returned, incrementing $i. if $i is one of my random numbers, keep that record. if we have 10 results, exit the loop.
4) run another query to fetch the details for the 10 randomly found ids.
here's the code which still needs tweaking for situations where there are less than 10 eligible records:
$rows_to_fetch = 10;
// these variables are meaningful within the real page context but here
// we just initialize them to the current server time
$target_zip_timestamp = strtotime('-2 days', time());
$target_zip_timestamp_plus_week = strtotime('+1 week', $target_zip_timestamp);
$start = get_microtime();
// NEW APPROACH
// grab all the ids first
$sql = "SELECT DISTINCT e.id FROM
" . TABLE_EVENTS . " e,
" . TABLE_EVENT_TIME_ASSOC . " eta,
" . TABLE_ZIPS . " z
WHERE eta.event_id=e.id
AND z.zip=e.zip
AND e.active=1
AND (
((eta.start_timestamp >= " . $target_zip_timestamp . ") AND (eta.start_timestamp <= " . $target_zip_timestamp_plus_week . "))
OR
((eta.start_timestamp < " . $target_zip_timestamp . ") AND (eta.end_timestamp >= " . ($target_zip_timestamp + REQUIRED_REMAINING_EVENT_TIME) . "))
)
";
$query_start = get_microtime();
$result = $db->query($sql, 'testing_query_2')
or die('second approach query 1 failed');
echo 'New method Query 1 time:' . (get_microtime() - $query_start) . ' seconds<br>';
$total_ids = $db->numrows($result);
echo $total_ids . ' distinct event ids found.<br>';
// at this point we should know how many rows there are
// let's pick 10 numbers at random between 1 and the total
$r_start = get_microtime();
$rand_nums = array();
do {
$r = rand(1, $total_ids);
if (!in_array($r, $rand_nums)) {
$rand_nums[] = $r;
}
} while (sizeof($rand_nums) < $rows_to_fetch);
echo 'Random number selection:' . (get_microtime() - $r_start) . ' seconds<br>';
print_r($rand_nums);
echo '<br>';
// loop through all the found ids, keeping only ones that match our random numbers
$loop_start = get_microtime();
$i=0;
$random_ids = array();
while($row = $db->fetchrow($result)) {
$i++;
if (in_array($i, $rand_nums)) {
$random_ids[] = $row['id'];
}
// if we found enough, we can break the loop
if (sizeof($random_ids) >= $rows_to_fetch) {
break;
}
}
echo 'Fetch loop time:' . (get_microtime() - $loop_start) . ' seconds<br>';
print_r($random_ids);
echo '<br>';
$db->freeresult($result);
// now fetch the important details that correspond to our random ids
$sql = "SELECT e.id, e.title, e.subheading, e.zip, eta.start_timestamp, eta.end_timestamp, z.city, z.st
FROM demo_events e, demo_event_time_assoc eta, demo_zip_codes z
WHERE eta.event_id=e.id
AND z.zip = e.zip
AND e.id IN (" . implode(',', $random_ids) . ")
AND (
((eta.start_timestamp >= " . $target_zip_timestamp . ") AND (eta.start_timestamp <= " . $target_zip_timestamp_plus_week . "))
OR
((eta.start_timestamp < " . $target_zip_timestamp . ") AND (eta.end_timestamp >= " . ($target_zip_timestamp + REQUIRED_REMAINING_EVENT_TIME) . "))
)
ORDER BY RAND()";
$query_start = get_microtime();
$result = $db->query($sql, 'testing_query_2b')
or die('final fetch failed');
echo 'New method Query 2 time:' . (get_microtime() - $query_start) . ' seconds<br>';
echo $db->numrows($result) . ' records found.<br>';
echo '<table width="100%">';
$prev_id = '';
while ($row = $db->fetchrow($result)) {
if ($row['id'] != $prev_id) {
echo '<tr>';
echo '<td>' . $row['id'] . '</td>';
echo '<td>' . $row['title'] . '</td>';
echo '<td>' . $row['subheading'] . '</td>';
echo '<td>' . $row['zip'] . '</td>';
echo '<td>' . date('m-d-Y, H:i:s', $row['start_timestamp']) . '</td>';
echo '<td>' . date('m-d-Y, H:i:s', $row['end_timestamp']) . '</td>';
echo '<td>' . $row['city'] . '</td>';
echo '<td>' . $row['st'] . '</td>';
echo '</tr>';
} // if id is new
$prev_id = $row['id'];
}
echo '</table>';
$db->freeresult($result);
echo 'TIME ELAPSED NEW METHOD:' . (get_microtime() - $start) . ' seconds<br>';
this approach was faster if the results of step 1 were already cached, but was actually slower than my original approach if the query had new timestamp values - which it ALWAYS will because the timestamp values are based on the current time which is constantly changing.
I'm really worried there isn't any way to make this query significantly faster. I was hoping to have this work in under a second when dealing with millions of event records. sigh. Is this hopeless?