I have generally used php for manipulating variables, this worked fine up to 10,000 records, however now I have to deal with batches of 1 million entry imports, am doing some serious data mining!! I need to use Mysql a bit more cleverly! be a hardcore mysql power user, so help!!
What I was doing, was doing a:
LOAD DATA LOCAL INFILE 'file.csv' INTO TABLE dump_data FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\r' " ;
This part is quite fast!
Then when I had in the data in table do a "SELECT * from dump_data" and using php mysql_fetch_row go thru it entry-by-entry. Then preparing the date from a VACHAR field to a mysql DATETIME ready variable, then inserting it into the another table.
So I'd run a php function that did this with each start date the start date:
// start_date format: 04/12/2005 18:36:00
// mysql desired date format: 2004-12-25 18:36:00
$len = strlen($the_date);
$year = substr($the_date, 6, 4);
$month = substr($the_date, 3, 2);
$date = substr($the_date, 0, 2);
$time = substr($the_date, 11, 5) . ":00";
$new_date = "$year-$month-$date $time";
However with 1 million entries, this takes php & mysql 7 minutes, to list each entry, and insert individually.
I have increased my timeout to 30 mins, and give 250mb to PHP in the php.ini file. But this is not the answer me thinks!?
There must be a way of either:
a) using mysql load data, to automatically structure the date, ready for mysql
b) copy the data from one table to another, with restructuring the date data using a mysql function
c) load the data as it is, and alter table from a varchar to datetime DEFAULT '0000-00-00 00:00', handling the data differences
Any ideas mysql geniuses !? Thanks!