Here is the case.
1) First, I would read in about 30,000 records from a xml's url and create a database table to hold on these 30,000 records.
2) Then every day, I would read in these about 30,000 records from the same xml's url. Save them in a new temp table, the new temp table's structure is the same as the original table's strucuture.
3) Then I will compare the original table with the new temp table.
3a) if a record_id is in the new temp table but not in the original table, add it to the original table, record the timestamp.
3b) if a record_id is in the original table but not in the new temp table, set up the record in original table to be "not_active", record the timestamp.
3c) if a record_id is in both new temp and original table, but the content is different, update the content of the record in the original table, record the timestamp.
3d) if a record_id is in both new temp and original table, and the content is the same, nothing will be changed.
I used php and mysql. Here is my query script
//photo
/*table fields:
photo_id, file_name, caption, keywords, aspect_ratio,
*/
//update
//this will update the original table content with the new content from the new temp table if the content is different.
$sql_update_photo = "UPDATE photo, temp_photo
SET
photo.file_name = temp_photo.file_name,
photo.caption = temp_photo.caption,
photo.keywords = temp_photo.keywords,
photo.aspect_ratio = temp_photo.aspect_ratio
WHERE photo.photo_id = temp_photo.photo_id";
mysql_query($sql_update_photo);
//INSERT INTO photo
//this will insert the new records
$sql_select_from_temp_photo = "SELECT photo_id, file_name, caption, keywords, aspect_ratio
FROM temp_photo
WHERE photo_id NOT IN (SELECT photo_id FROM photo)";
$sql_insert_into_photo = "INSERT INTO photo (photo_id, file_name, caption, keywords, aspect_ratio)
$sql_select_from_temp_photo";
mysql_query($sql_insert_into_photo);
//UPDATE photo
//de-activate the origianl records if not in the new.
$sql_update_photo = "UPDATE photo
SET active = 0
WHERE photo_id NOT IN (SELECT photo_id FROM temp_photo)";
mysql_query($sql_update_photo);
//re-activate the origianl records if in the new.
$sql_update_photo = "UPDATE photo
SET qm_active = 1
WHERE photo_id IN (SELECT photo_id FROM temp_photo)";
mysql_query($sql_update_photo);
It is working fine with the records like up to 2,000 - 3,000. And it is running as a cron one time a day. So everything is ok.
But with the records up to 20,000 or more. This approach is very slow.
I think this query is slow
//update
//this will update the original table content with the new content from the new temp table if the content is different.
$sql_update_photo = "UPDATE photo, temp_photo
SET
photo.file_name = temp_photo.file_name,
photo.caption = temp_photo.caption,
photo.keywords = temp_photo.keywords,
photo.aspect_ratio = temp_photo.aspect_ratio
WHERE photo.photo_id = temp_photo.photo_id";
Any advices that I can improve the performance?
Instead of this query script.
//update
//this will update the original table content with the new content from the new temp table if the content is different.
$sql_update_photo = "UPDATE photo, temp_photo
SET
photo.file_name = temp_photo.file_name,
photo.caption = temp_photo.caption,
photo.keywords = temp_photo.keywords,
photo.aspect_ratio = temp_photo.aspect_ratio
WHERE photo.photo_id = temp_photo.photo_id";
Another approach I used is use left join and is null (use <=> for allow null value)
http://phpbuilder.com/board/showthread.php?t=10367932&highlight=left+join
Some friends said left join and is null will be faster. (I will give a try now).
Is this the solution I should go to? Any other solutions?
Thanks!