Could anyone please suggest some options of checking for duplicates in MySQL tables?
The real table has about 20,000 records (each record is around 10000 in length). The other table is a "submissions" table.
The process goes: insert into submissions, approve into real table.
Which means, my duplicate check needs to check both tables.
I currently have something like this:
function checksongtitle_checkartist($artist, $songtitle) {
// this function is for testing purposes to make sure duplicates don't exist
// incoming: artist, songtitle
global $DB_CONF;
$tartist1 = str_replace("'","",$artist);
$tartist2 = str_replace("?","",$artist);
$tartist3 = substr($artist,0, (int)strlen($artist)/2);
$sql = "SELECT artist, songtitle from {$DB_CONF["lyrics"]}
where songtitle = '".addslashes($songtitle)."'
AND
(artist = '".addslashes($artist)."'
OR artist = '".addslashes($tartist1)."'
OR artist = '".addslashes($tartist2)."'
OR artist like '".addslashes($tartist3)."%') limit 1";
$sql_result = mysql_query($sql) or die ("Couldn't execute query.
(songartist()) ".mysql_error());
if ($sql_result) {
if ($row = mysql_fetch_array($sql_result)) {
$tempsongtitle = $row['songtitle'];
$tempartist = $row['artist'];
}
}
// if songtitle and artist exist, let's return FALSE(0), else, TRUE(1)
if ( ($tempsongtitle != "") && ($tempartist != "") ) {
// mysql_free_result($sql_result);
return 0;
} else {
// mysql_free_result($sql_result);
// now let's check the submissions database as well
$sql = "SELECT artist, songtitle from {$DB_CONF["lyrics_submissions"]}
where songtitle = '".addslashes($songtitle)."'
AND
(artist = '".addslashes($artist)."'
OR artist = '".addslashes($tartist1)."'
OR artist = '".addslashes($tartist2)."'
OR artist = '".addslashes($tartist3)."') limit 1";
$sql_result = mysql_query($sql) or die ("Couldn't execute query.
(songartist()) ".mysql_error());
if ($sql_result) {
if ($row = mysql_fetch_array($sql_result)) {
$tempsongtitle = $row['songtitle'];
$tempartist = $row['artist'];
}
}
// if songtitle and artist exist, let's return FALSE(0), else, TRUE(1)
if ( ($tempsongtitle != "") && ($tempartist != "") ) {
// mysql_free_result($sql_result);
return 0;
} else {
// mysql_free_result($sql_result);
return 1;
}
} // end first else
}
please excuse how messy it may look. i've been trying various things lately and haven't had a chance to clean it up.
Essentially it is checking both the real and temporary tables based on artist and songtitle. If I need to insert 10,000 records, this process takes many hours. Can anyone recommend a faster way of checking for duplicates?
-k