hadoob024 wrote:Hmm. That is true! Yeah, I guess I should play it on the safe side. Quick question, with the suggestion you made, "where timestamp > now()- interval '30' days". Is this any safer than what I was planning on originally? I guess instead of just deleting the records, maybe I should just dump them into another table? How hard would this be to setup?
Like Sxooter, I believe that all data must be recoverable, ALWAYS.
I archive out data in this way.
Have an archive table that is identical in structure as the master table, EXCEPT that any identity/autoinc columns are retyped to int or whatever.
Copy all data you want to archive into the archive table using a query like this
$insert_query = "INSERT INTO archive_table
SELECT * FROM main_table LEFT JOIN archive_table USING(main_id)
WHERE archive_table.main_id IS NULL AND DATEDIFF(main_table.timestamp, now())>30";
The left join with is null means that only those records not already in the archive table will be added. You could just let the unique indexes bounce of any duplicates, but that means you will be processing many more records every time.
You then use an inner join to delete records in the main table that have been copied to the archive table.
$delete_query = "DELETE main_table.* FROM main_table INNER JOIN archive_table USING(main_id)";
That will delete any records in the main table that are in the archive table.
For full-on data integrity checking you can match every column in the main table to it's corresponding column in the archive table and only delete records that are identical. You would then have to check for any records that were only partial matches. That would be the boiler-plate method as it will trap any errors or corruption that occured when the data was copied to the archive table.