Hello,
I have about 25 csv files with a total of about 500,000 records I have to do a couple of things with.
I have currently been reading the files into an array with fgetcsv. After doing this, I set up a while loop where I select certain parts of the csv files (since each of the 25 files include different fields) and group them together one by one by assigning them to variables within the while loop. Then, I do an insert with these variables, which essentially reads each of these modified rows of data in the csv files into the mysql database one by one.
The huge problem with this is the time it takes. For 500,000 rows it is taking over 24 hours using apache 2 and php 4.38. It obviously takes the longest amount of time when it is reading the larger files. For example, there is one 20 MB csv file and it literally takes "all day".
My question is this - is my thinking the fastest way for me to accomplish this task? It definitely works, but it is SLOW. Would LOAD DATA INFILE be any faster? The problem is I would have to normalzie the data after it is in the database, which could be a bit tricky. Also, I am not even sure if this is faster than a row by row insert from what I have read.
Does anyone else have any thoughts on how to read in different csv files, use php to make them similar, so they "fit" into the same table/database and then insert the contents of these csvs into the table (or modified csvs more accurately) at the fastest rate?
Further I've done some tests.
I have used the code to disable the indexes and then re-enable them at the end of the script to speed up inserts. Inserts aren't really the problem now. It is coming in the actual reading of the file into the array with fgetcsv I believe.
$fn = "filename.txt";
$bcs = array();
$id = fopen($fn, "r"); //open the file
$size = filesize($fn) +1;
while (($record = fgetcsv($id, $size, "|")) !== FALSE) { //start a loop
$bcs[] = $record;
}
$count1 = (count($bcs));
// read file into main array
// set amount of times to go through the loop
$print = $count1;
echo $print . "\n";
$row = 0;
$a = 0;
do { assign parts of the array into variables & do inserts here } while (++$a < $print);
For some reason, especially with files with 50MB of data, this goes very slow. It appears the slowdown is somewhere here in this code I've cited. Is it because the files are not broken into chunks, and taking up a lot of memory something to that effect? If so, how would I go about reading the file in chunks and would that do the trick? Because this method does work, it's just VERY slow.
Also, I've tried eliminating the do while, and calculating the time to go through that loop. I did that successfully, eliminating the need for the count(); function, however it didn't do much to speed it up. Again, I believe it is in reading the file somehow. I am sure the experts here know more about this however. This was just some very light testing I did to speed it up. I am fairly certain the database structure is not the problem.
Thank you VERY much in advance!