Delete Dupe emails in table

dmacman1962

Hi All,

I have a user table that I need to delete all records that are duplicates (based on the EmailAddress field). I have my site coded to not allow dupes (by query first then either update or insert), but I also have a page for employees to import records from a CSV with fgetscv that sometimes inserts records where that email address exists.

I tried about a dozen different approaches, and none worked.

Here is a sample of what I tried and end up with it timing out (The table has 1.8 million records).

$sql = "SELECT MIN(maxID) AS totalcount FROM max8 GROUP BY EmailAddress";
$result = mysql_query($sql) or die (mysql_error());
$num_rows = mysql_num_rows($result);
$unique_ids = array();
for($i=0;$i<$num_rows;$i++) {
    $unique_ids[] = mysql_result($result,$i,0);
}
mysql_free_result($result);
$id_list = implode(',',$unique_ids);
$sql = "DELETE FROM max8 WHERE maxID not in($id_list)";
$query = mysql_query($sql) or die (mysql_error()); 

mysql_close($conn);

echo 'All duplicates where deleted';

I also noticed some addressed this by making a temp table, then exporting a unique select query, then comparing and deleting, but that seems like a round-about way.

My end goal it to make a process page, that I can run and either delete all the dupes (or if it is a timeout issue) delete them state by state.

Thoughts?

Regards,
Don

NogDog

My first thought was to use a temporary table, copy the latest row for each email address to it, clear the original table, then copy the temp table contents into the original. However, I would not feel comfortable with this unless transactions can be used, so that you can wait to commit when everything has processed successfully.