I have a load process that takes a csv and load it into a mysql database. It takes about 15 seconds to complete on the database side w ~6000 some rows being instered however the php script hangs for a long time long after I see the database work being done. I am using PHP 5 w/ PDO prepared statement calling a MySQL Stored Procedure. Do you think PDO is the bottleneck? Anyone have an idea what might be?

    Without seing the actual code it is impossible to know what the problem is. Please show the code and I'm sure that someone can help you out.

      Here is a the meat of it.

      I manged to speed it up a bit as I had the prepare statement inside the loop.

      Lookin at it now I am thinking bingParam might make it faster. Thoughts?

      $db_stmnt = $this->db->prepare('CALL SP_AddCTCourse(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)');
      foreach ($data as $line) {
      			$course_data = explode($tab, $line);
      			$db_stmnt->bindValue(1, $course_data[0], PDO::PARAM_STR);
      			$db_stmnt->bindValue(2, $course_data[1], PDO::PARAM_STR);
      			$db_stmnt->bindValue(3, $course_data[2], PDO::PARAM_STR);
      			$db_stmnt->bindValue(4, $course_data[3], PDO::PARAM_STR);
      			$db_stmnt->bindValue(5, $course_data[4], PDO::PARAM_STR);
      			$db_stmnt->bindValue(6, $course_data[5], PDO::PARAM_STR);
      			$db_stmnt->bindValue(7, $course_data[6], PDO::PARAM_STR);
      			$db_stmnt->bindValue(8, $course_data[7], PDO::PARAM_STR);
      			$db_stmnt->bindValue(9, $course_data[8], PDO::PARAM_STR);
      			$db_stmnt->bindValue(10, $this->formatTermYear($course_data[9], '00000'), PDO::PARAM_STR);
      			$db_stmnt->bindValue(11, $course_data[10], PDO::PARAM_STR);
      			$db_stmnt->bindValue(12, $this->formatTermYear($course_data[11], '99999'), PDO::PARAM_STR);
      			$db_stmnt->bindValue(13, $course_data[12], PDO::PARAM_STR);
      			$db_stmnt->bindValue(14, $course_data[13], PDO::PARAM_STR);
      			$db_stmnt->bindValue(15, $course_data[14], PDO::PARAM_STR);
      			$db_stmnt->bindValue(16, $course_data[15], PDO::PARAM_STR);
      			$db_stmnt->execute();
      }
      

        Don't know if it will really make any difference but you could try just passing an array of values to execute instead of the binding stuff.

        The PHP delay sort of sounds like garbage collection is being done and lot's of little bits of memory are being reclaimed. Might also try just building the sql statement directly and seeing what happens.

          I haven't worked with prepared stateents, so this might be totally wrong. But it seems to me that the statement should be in the foreach loop as well.

            Piranha wrote:

            I haven't worked with prepared stateents, so this might be totally wrong. But it seems to me that the statement should be in the foreach loop as well.

            As you say, totally wrong. But I mean that in a nice way. The idea behind a prepared statement is that it can be prepared(parsed) once and then executed multiple times with different data sets.

              I would do the Load Data statement but I am not going to be the one uploading the file and also the data needs formatted. Good suggestion though.

                HP400 wrote:

                I would do the Load Data statement but I am not going to be the one uploading the file and also the data needs formatted. Good suggestion though.

                Who does it should not matter, just make a page with FILE control so they can browse to and upload the file.

                Personally, in every db I use, I load csv data into a work table first, validate the contents, then INSERT ... SELECT into the main table. This method also allows me to CAST data types, reformat data, etc. Just make all columns in the work table into BINARY data types so that they will accept anything so that your code deals with errors and not the user.

                  Some of the columns are actually broken out into other tables so the SP I run queries for the id value and replaces it in the data. The SELECT INTO wouldn't work for me then.

                    You are not understanding me.

                    Be it a csv file or a work table, the data is the same. A work table gives you more control, full sql functionality, and greater speed.

                    So, instead of running your SP on data from the csv you run it on exactly the same data from the work table. Only you can cleanse your data before doing so.
                    At some point, that csv is going to get corrupted, they ALWAYS do. Some prat opens it in excel first and whoops: all numbers are now floats including string phone numbers, dates have been converted, etc etc.
                    So, in the work table you can run the data verification that you should be running and produce meaningful reports instead of just having data rejected because a required field is missing or the wrong data type or duplicate key etc.
                    Then, once the data is verified you can run it through your SP.

                    OK, you may have to re-write the SP to take advantage of the greater flexibility that a work table gives: its called optimisation. You were after all trying to speed things up. Using a work table will give you that speed up by orders of magnitude, as well as all the other benefits I have mentioned.

                    Basic Rule with DB, never import unverified data into a live table: always verify and cleanse it first. The other experienced DBAs around here will concur.

                      PS, what I mean is that you are going to process 6000 records 1 at a time in an SP that maybe has 5 queries in it = 30,000 queries??? No wonder it is slow.

                      From a work table you can insert 6,000 selected columns into 1 table = 1 query
                      Select and update all 6,000 ids in the work table = 2 queries
                      Repeat for other columns = 4 queries
                      Insert remaining columns into final table with updated FKs = 5 queries

                      I call that a no-brainer. It is the logic of SQL and relational databases as opposed to the logic of php and sequential record processing.

                        Write a Reply...