I've been reading about this for about the last hour and a half but I still feel unsure, so I thought I'd ask the experts.

I've begun work on a new project over the weekend. The database is currently sitting at about 2.66 million records spread across 16 tables, with plans to increase in size substantially. The tables are not related to each other and thus do not require foreign key constraints. I've been using InnoDB as that's the default engine with my (rather old) MySQL installation.

I started considering my engine when my PHP script to calculate the total rows in the database started to choke and thus time out, even when I increased the time out value to 60 seconds. I looked into other ways of getting the total rows and found SHOW TABLE STATUS, which would simply pull the tables' meta data, but InnoDB doesn't store exact table row counts and instead only approximates (which is grossly inaccurate).

My database will be extremely write-intensive but 99.99% of all the writing will be done by me via PHP scripts. I have scripts that generate values (via loops) that get inserted into the database. There is the ability on the front-end to do an insert, but it is on an individual basis and done manually. The majority of front-end interaction of the database will be reading. There are no delete or update queries; any updating or deleting will be done by me via phpMyAdmin and I don't really see myself needing to. The select queries are very simple (SELECT blah FROM blah WHERE WhatImLookingFor = WhatIsHopefullyThere) and do not require any table joining. My PHP script determines, based on what is being searched, which table perform the query on, so I'm only searching through a fraction of the data at a time. That's all the database interaction for my project.

My concern is with the table-locking of MyISAM. I understand that the entire table is locked when in use. What I am wondering is if this will really be a problem if inserts are "coming from one place" (AKA, me) instead of multiple sources such as other users. Will I notice a significant performance hit on my scripts if I move to MyISAM? Is it able to lock, insert the row, unlock, then lock, insert the row, unlock, etc. incredibly quickly? Suffering a performance hit on inserts isn't the end of the world, but I'd like to avoid it if possible since some of my scripts can run for hours. What I am more concerned with is maintaining reading performance, which seems to be the benefit of MyISAM. With 2.6+ million rows and counting, I need to maintain incredibly fast searching.

If you have made it this far, thank you for taking the time to read.

Any advice would be greatly appreciated. Thank you!

    Bonesnap wrote:

    I started considering my engine when my PHP script to calculate the total rows in the database started to choke and thus time out, even when I increased the time out value to 60 seconds. I looked into other ways of getting the total rows and found SHOW TABLE STATUS, which would simply pull the tables' meta data, but InnoDB doesn't store exact table row counts and instead only approximates (which is grossly inaccurate).

    Maybe the solution is to have a summary table store the row count. This row count could be updated with a trigger.

    Bonesnap wrote:

    My database will be extremely write-intensive but 99.99% of all the writing will be done by me via PHP scripts. I have scripts that generate values (via loops) that get inserted into the database.

    With InnoDB, you can wrap the inserts in a transaction.

    Bonesnap wrote:

    With 2.6+ million rows and counting, I need to maintain incredibly fast searching.

    Then you should have appropriate indices.

      Bonesnap wrote:

      I started considering my engine when my PHP script to calculate the total rows in the database started to choke and thus time out, even when I increased the time out value to 60 seconds.

      You only have sixteen tables - how are you managing to get sixteen "SELECT COUNT(*) FROM tablename" queries to take more than a minute?!

      Bonesnap wrote:

      My concern is with the table-locking of MyISAM.

      MyISAM is a poor choice if you're doing much writing - the lack of transactional support, the lack of concurrency support, and the lack of referential integrity protection.

        So I did some preliminary testing with a single table. I didn't realize phpMyAdmin could copy a table so quickly, otherwise I would have tested before posting. I'm not sure if I am doing something wrong, but it seems that MyISAM is much, much, much faster at inserting rows. I'm waiting for the "gotchya!" to appear to bring me back to reality, because from everything I have read and seem to have gleaned, this should be the opposite, no?

        So, a have a number of different scripts that range in time from a few minutes to several hours. I tried my fastest script, that usually takes about five minutes to complete. I ran it on the MyISAM test table and it was complete almost instantly - like five seconds. I double-checked the table and sure enough the rows were added. I thought it was a fluke so I ran it two more times with similar results. I then tried a script that normally takes ~25 minutes to complete and it was done in 12 seconds flat.

        Searching also seems to be just as fast. It would appear MyISAM is much better suited for my situation but I am still skeptical. What is going on? Am I missing something? I feel like there's a huge blind spot and I'm missing something big here. This shouldn't be... should it? My knee-jerk reaction to my testing results is to copy everything over to MyISAM tables and move on, but something is holding me back.

        laserlight;11006640 wrote:

        Maybe the solution is to have a summary table store the row count. This row count could be updated with a trigger.

        This was one solution I read, but the article I read didn't mention a trigger. I'll have to look into that.

        laserlight;11006640 wrote:

        With InnoDB, you can wrap the inserts in a transaction.

        I'll look into this, too.

        laserlight;11006640 wrote:

        Then you should have appropriate indices.

        I do. The columns that are being searched are primary keys, so they have an index.

        Weedpacket;11006646 wrote:

        You only have sixteen tables - how are you managing to get sixteen "SELECT COUNT(*) FROM tablename" queries to take more than a minute?!

        When each table has 165,000+ rows I guess, though phpMyAdmin seems to have no problem producing the results in a timely manner.

        Weedpacket;11006646 wrote:

        MyISAM is a poor choice if you're doing much writing - the lack of transactional support, the lack of concurrency support, and the lack of referential integrity protection.

        I keep hearing the term "transaction" but haven't really been able to find a definition in context with databases. Could you elaborate on it? Do I actually need transactional support for my situation? And I don't really need concurrency since I am essentially the only person writing to the database, and even if there were two people trying to insert a row at the same time, they have a 1/16th chance of requiring the same table at the same time. Lastly, none of the tables relate to each other, so I have no need for referential integrity.

        Thank you both for your replies.

          Maybe looking into 'views' and 'memcached' would be a good idea... especially w/ your concerns.

            Bonesnap wrote:

            I'm not sure if I am doing something wrong, but it seems that MyISAM is much, much, much faster at inserting rows. I'm waiting for the "gotchya!" to appear to bring me back to reality, because from everything I have read and seem to have gleaned, this should be the opposite, no?

            Bonesnap wrote:

            I keep hearing the term "transaction" but haven't really been able to find a definition in context with databases. Could you elaborate on it? Do I actually need transactional support for my situation?

            That's probably a gotcha right there. Read the MySQL docs on Optimizing InnoDB Transaction Management.

              laserlight;11006654 wrote:

              That's probably a gotcha right there. Read the MySQL docs on Optimizing InnoDB Transaction Management.

              Excellent, thank you for this. Between that link and Wikipedia I believe I have an understanding of what a transaction is. I also feel foolish because I've been doing inserts individually even though I know they can be done in bulk; it just never occurred to me. :o

              Two questions:

              1. Where can I set the max_allowed_packet property? It apparently controls the max size of the query string, and since I will be (at times) inserting hundreds of thousands of rows in a single transaction I'll probably need to modify it.

              2. Will duplicate rows be handled any differently in a bulk transaction? The primary key column has to of course be unique and up to this point I just let MySQL deal with it which I assume produces an error but I don't care and just let it move onto the next insert. I assume duplicate entries are handled similarly in this situation?

              I'm going to start optimizing my queries and see what improvements I can get. Thanks again!

                Well I have answered both my questions, but now I am getting a syntax error even though when I print out my query and copy and paste it into phpMyAdmin it executes without an issue. 😕

                More investigating I shall do...

                  Okay I guess I simply don't understand MySQLi (admittedly I haven't used it a lot) because whenever I try to execute my script my MySQL server crashes to the point that I can't even restart it and I have to reboot my web server. This is basically my code:

                  $query = "SET AUTOCOMMIT=0; ";
                  $query .= "START TRANSACTION; ";
                  
                  foreach($values as $value)
                  {
                      //$query .= insert statements;
                  
                  
                  set_time_limit(0);
                  }
                  
                  $query .= " COMMIT;";
                  
                  if(mysqli_multi_query($conn, $query))
                  {
                      while(mysqli_more_results($conn) && mysqli_next_result($conn))
                      {
                          mysqli_use_result($conn);
                      }
                  
                  
                  echo 'Done!<br /><br />';
                  }
                  else
                      echo mysqli_error($conn).'<br /><br />';
                  

                  This causes my MySQL server to crash and burn. The fact that I am essentially new to MySQLi and I've never done multiple queries before compounds my confusion and frustration. I feel like I am missing a small piece of the puzzle to making this all work.

                  In reality I don't really care what the results are from the queries. I have INSERT IGNORE on them so I don't have to worry about duplicates. I just want them to insert and move on. Am I approaching this the wrong way?

                  Any pointers would be greatly appreciated.

                    Bonesnap wrote:

                    When each table has 165,000+ rows I guess, though phpMyAdmin seems to have no problem producing the results in a timely manner.

                    Well, my system is totally different from yours (since I'm not even using MySQL), so YMMV, but...

                    I found a DB suitably-sized table to run a test. There's a note in my DBMS's documentation that count() may seem unusually slow compared to others' because it needs to scan the entire table, so again YMMV, but be aware that the database engine being used is fully ACID-compliant, so it's more like InnoDB than MyISAM. The result:

                    Select count(*) from organism_classification
                    
                    202395
                    
                    Total query runtime: 78 ms.
                    

                    EDIT: I was curious about what "slow" looked like, so I dumped some more rows into the table (this part took over a minute; I suspect it would have gone faster if the file I was reading from and the database had been on separate drives, because disk activity was where all the time went) and prevented vacuuming which had run just after my initial load.

                    It took 23 seconds to count 2,074,118 rows. Sixteen of those would take over six minutes.

                    One full vacuum later (taking 24 seconds; a full vacuum is not recommended for routine use, because it's much slower and locks the table for the entire duration (it works by making a whole fresh copy of the table)), and time taken to run the count was down to half a second.

                    Load another 1.8M records and the count time (3,945,841 rows) was back up to 20 seconds. Another full vacuum (37 seconds this time) and it was back down to one second.

                    No indices were kept in the execution of these tests. Now I really must get off this sidetrack.


                    As for your MySQLi problem; obviously I wouldn't know why MySQL crashes (if the DBMS is crashing then that's a MySQL bug); have you cut things down to see whether the problem is the nature of the commands being issued or the sheer size of the multi-query (if the former then sending individual queries for the problematic bits might be necessary, if the latter then batching the insertions to keep the size down)? Have you considered - for example - inserting records using a prepared statement or using a bulk import mechanism?

                      Okay I think I solved all my problems. You guys won't like it. I ditched MySQLi (inb4allyourhate) and just used individual MySQL query statements to execute the AUTOCOMMIT command, the START TRANSACTION, etc. and then of course the COMMIT at the end.

                      I created a trigger to update a secondary table that keeps track of the total, and it works beautifully. The speed increases are astounding, and I know I can optimize it even more, but for now I am content. A script that normally took ~25 minutes to run (inserts 27,607 rows) now executes in 16 seconds. I'm a happy camper.

                      Thanks to everyone for your advice and time. I really appreciate it (and yes, I will eventually change my MySQL statements over to MySQLi :p).

                        Bonesnap wrote:

                        and yes, I will eventually change my MySQL statements over to MySQLi

                        Obviously, the SQL remains the same (unless you use prepared statements, since mysql doesn't know how to bind with them).

                        I don't use MySQLi but I would expect that [man]mysqli_query/man would work the same way as [man]mysql_query/man; apart, I see, from the exchange of arguments (if you stay with the procedural interface) and the fact that one returns a resource and the other returns an object.

                        Or maybe I'm wrong and the reason for the hassles is that the apparent similarity between procedural mysqli functions and procedural mysql functions aren't that similar after all. In which case: why the procedural mysqli functions....

                        $mysqli->autocommit(false);
                        
                        $insert_statement = $mysqli->prepare("INSERT INTO .... VALUES (?, ?, ...?)");
                        foreach(... as $rowdata)
                        {
                        	$insert_statement->bind_param('iis....', $rowdata['integer'], $rowdata['anotherinteger'], $rowdata['string'], ...);
                        	$insert_statement->execute();
                        }
                        $insert_statement->close();
                        
                        $mysqli->commit();
                        

                        Modulo things like error checking and subsequent rollback, and the fact that I'm guessing at all of that based only on the documentation (there may be a way to do multiple sets of bindings and execute them all at once).

                          Write a Reply...