Read 25 MB txt file with php

Roger_Ramjet · Aug 5, 2004

Don't listen to all these php programmers who are frightened of sql, just use the tool tht is provided, which is LOAD DATA: it will import your file directly into your database table, and very fast as well. If you do any of the other solutions outlined here then your 25Mb is going to take you 25 mins to upload. LOAD DATA will import any type of txt file, you just specify the field delimiter etc. As to file size limit, that is server set by your host and will apply however you process it.

If you need to manipulate or transform your data in any way then import it into a temp table and then use a query to convert it into your permanent table. Messing around with arrays etc is just overkill and will take you twice as long to code.

TheBeginner · Aug 5, 2004

Well I can't use the LOAD DATA. My problem is that the rows sometimes have 4 columns and sometimes 5 columns = 1 column is sometimes missing in the middle. In that case loading it into the database will not work. I have to work with it before I put it to mySQL database. I know that the if the column before (2) has the value 0 then the next one is missing.

I will take a look closely at fgetcsv !!

TheBeginner · Aug 5, 2004

bubblenut: This works great! It is a cvs style file!

Many Thanks for all your help!

Roger_Ramjet · Aug 6, 2004

Ah, well! TheBeginner you didn't say that columns would be missing. However you should be aware of this:
---- quote -----
When loading a table from a text file, use LOAD DATA INFILE. This is usually 20 times faster than using a lot of INSERT statements. See section 14.1.5 LOAD DATA INFILE Syntax.
(from: http://dev.mysql.com/doc/mysql/en/Insert_speed.html)
----- end quote -----

Now, I come from a mainstream database background, ie Oracle, Access, SQLServer rather than CGI and web programming; and I have noticed that the preferance around here is for code solution for what I see a simple database and sql problems. I'm also ingrained with the Need for Speed: which is always achieved through sql queries as opposed to script code - that is what the quote above is all about.

In this case, missing columns are no problem. A CSV should have field delimiters and record delimiters: your missing field should just be 2 commas with nothing in between :-

var1,var2,var3,var4,var5,
var1,var2,var3,var4,,
var1,var2,var3,var4,var5

So an import to a table is only a problem if the missing field is specified as NOT NULL.

So, if time does not matter to you, and you prefer to write complex scripts, then by all means use fgetcsv.

Personally, I believe that the correct tool should always be used.

I also beleive that the best contribution I can make to this forum is my extensive backend DB experience, in exchange for all the php help that I am getting out.

TheBeginner · Aug 6, 2004

Yes, I understand your point and I agree that this will be no problem. Already tried with a smaller file. Actually the missing column will be a space (" ") with TABS on bothe sides. That is how fgetcsv recognize it.

I will se which method to use:

LOAD DATA; but then the file has to be downloaded first from the other website.
fgetcsv; I can open the file already at the other server and maybe gain some work by already from the begining split the data to different tables.

I will try both solutions and look at the time and how much I save!

Thanks!

Roger_Ramjet · Aug 7, 2004

Hey, sorry. Did not mean to rant. Too much time on my hands right now.

Just a general rule when working with databases: Direct queries are always much, much quicker than iterating through recordsets in code.

They can, however, be much harder to design. What I often do is to develop my solution in code so that the logic is more obviouse, and then reverse-engineer the solution looking for opportunities to substitute queries for functions and loops.

yelvington · Aug 8, 2004

LOAD DATA INFILE is dandy but it doesn't do any error checking, and if you need to munge any of the data items before insertion, you pretty much have to write your own loader. It's not a big deal, though, and as suggested fgetcsv will work nicely. The original concern was with the size of the file. Since fgetcsv reads one line at a time, there is not a memory usage issue. You will probably have to tell PHP to extend the maximum script execution time, though.

bubblenut · Aug 8, 2004

Would it be quiker to run through the data, write it to a well formatted file and then LOAD DATA INFILE on that?

Roger_Ramjet · Aug 8, 2004

Yes. Again, as a general rule: clean your data at the source if you can.

Now, if this whole thing is going to be done repeatedly and your source data is suspect then you may want to try this procedure:

(NB it is designed taking into account that a 25M file probably contains 100,000s of records)

Use LOAD DATA LOCAL INFILE to load the data into a temporary table. Design this table to accept just about anything. Use no table constraints whatsoever.
Design a set of queries that check for known problems.
Use whatever method is best to fix this data. If you can use php scripts to do this great: but the same as fgetcsv? Yes, only now your error handling routine does not have to check every record, only those with errors are processed. This is much quicker when you are dealing with this volume.
Copy the good data from the temp table to your permanent one.
Manually review the data you fix with your script.

This solution is one I use all the time in Access and VB/SQLServer when I am receiving input from clients. They usually send it in a spreadsheet and it can be all over the place. At a minimum, you can get the bulk of the data loaded at once and add the rubbish once it is fixed.

TheBeginner · Aug 8, 2004

Hi,

Maybe one OffTopic thing I would like to ask about the database.

I read in the 25MB file with fgetcvs just to try out (and learn) and was extremelly suprised that the data stored resulted in a databse of 120MB. Is it normal that the databse will be 4 times bigger that a text file with the same data??

Roger_Ramjet · Aug 8, 2004

A database will always be bigger than the data it contains. First you have your system tables, your indexes, your table definitions, these all use up space. Depends upon your database schema how much extra storage the database will use.

Then there is the data itself. If you are using CHAR instead of VARCHAR for instance then you could be storing a lot of empty space. Numbers; they may only take up 3 or 4 ACSI characters (3 or 4 bytes) in your text file, but in an INT field they will take up 11 bytes.

TheBeginner · Aug 8, 2004

Thanks,

Stupid me! That is when you work 3 at night. I forgot to choose VARCHAR. Well, mistakes has to happen! The databse went down to almost 24 MB. Great!

Read 25 MB txt file with php

RRoger_Ramjet

TTheBeginner

TTheBeginner

RRoger_Ramjet

TTheBeginner

RRoger_Ramjet

Yyelvington

Bbubblenut

RRoger_Ramjet

TTheBeginner

RRoger_Ramjet

TTheBeginner