Is there a utility out there for benchmarking PHP/MySQL applications (for tracking them). I am working on Windows and have both Apache Server and MySQL server on the same machine.

I am trying to insert/update records into a MySQL db from a series of PHP arrays that have been populated by parsing and distributing information from a flat text file. And it takes so long that things seem to time out!!

I have been working on this in my spare time for a while, and can get around well enough, but still consider myself a coding newbie.

Anyway, I have 11,000 files that I want to parse and upload into about 22 different tables, a couple of which will have about 1,000,000 records of about 25 fields each. Several others are about half that size, and the rest are fairly small.

The way this is structured, I parse one file into the different arrays, and then I search through the dB for duplicates before inserting/updating. If a dupe is found, I run through the fields looking for modified data. When all arrays are complete, I go to the next file.

In the early stages of updating, things crank along fine. But as the dB gets larger, it takes longer and longer and longer, until it is taken more than 3 minutes a file, and some records seem to take so long as to time out. And that's only having gone through about 1000 - 1500 of the files I need to deal with (at this point the largest dB table is 100,000 records).

I have upped the PHP.INI timeout to 3 minutes, and increased the max allowable memory for a PHP script to 16mb, but nothing works. Is it my machine? Is it my code? Is it MySQL?

Any suggestions?

Thanks.

    What do your queries look like?

    Do you have any indexes created?

    Why don't you just set the timeout for PHP to infinite?

    It's a hell of a lot of records... if I were you, I'd expect it to take a LONNNNNGGGGGG time.

      If a field is used in a join, order by or where clause then it should be indexed. Also you can avoid the search for dups by creating an index on all the columns so that duplicate entries are not allowed.

        What sort of data manipulation are you doing? Can you identify the bottleneck - the parsing, the php arrays (ugh), the deduping (most likely), the inserts?

        Personally, I find it easier to dump all the data via bulk inserts (mysql LOAD DATA) into temp tables and use scripts (perl or python) to massage it.

        My guess is that the deduping code is not scaling - lack of indexes, poor implementation, etc. In most cases you can use sql to exclude duplicates.

        Post the code and I can be more helpful.

          Also you can avoid the search for dups by creating an index on all the columns so that duplicate entries are not allowed.

          That is the point.
          If you need to check for duplicate rows yourself, your design is not good and you should create indexes.

            Thanks all for your suggestions. I've been out of touch for a couple of days and glad to see some response.

            I will try a variety of these suggestions. As I say, I'm new to the whole world. If I can figure out a way to post a portion of the code, I'll let you know. It's pretty ugly and convoluted right now and one of my efforts is commenting and clean up. I mean, I'm having trouble tracking it.

            I'll keep working and post any improvement.

            As far as the issue of dupes goes, I will rexamine my design and my queries and check things out. I would love to use LOAD DATA INFILE and do my clean up afterwards, but I don't know perl or python and am not sure if PHP can do the same sort of clean up efficiently. Any thoughts?

            Also, as part of my original question...is there any utility out there that can help me trace my program from start to finish to see how long different segments are taking? It might thelp me track down this bottleneck.

            Anyway, thanks again to all who replied and I'll post back any improvements I can muster.

              I would love to use LOAD DATA INFILE and do my clean up afterwards, but I don't know perl or python and am not sure if PHP can do the same sort of clean up efficiently. Any thoughts?

              You can use php for this, as that's where your comfort zone is.

              Also, as part of my original question...is there any utility out there that can help me trace my program from start to finish to see how long different segments are taking? It might thelp me track down this bottleneck.

              Use logging to find the bottleneck.

                6 months later

                I have the same problem.

                Was there any good solution to this?

                  Time your queries and scripts to find out where the bottleneck is, then post the queries / database design and we'll see if we can help.

                  You can use query logging in MySQL, time it with PHP using getmicrotime(), etc, etc...

                    I executed the same script on the wenserver, and it worked fine. Quick as h***.

                    But trying to do it on my own test server, the insertions just stopps! Sometimes after just 300 inserts.... other times after 500.

                    Hmm....
                    Might be my MySQL DB config og php.ini config.

                      Nope.

                      I cut the csv file fromm 1000 lines down to only a 100 lines.
                      Executed perfectly on my server at home.

                      But i found out one thing though.
                      It executes for 29 seconds even if I have 100 lines or 300 lines.

                      The whole thing simply STOPS! (argh!)
                      And rows ARE inserted into my tables.

                      I have posted another thread with some of the same problem
                      here.

                      If you could reply on that thread, I would apprechiate it!

                        Write a Reply...