I need to know if there is any limit (or at least a practical limit) to the size that a PHP array can be.

This is the reason I am asking...

I wrote some PHP code and setup a mysql database to help a company that needed to take data from their inventory and merge it with inventory data from two suppliers (match similar part numbers in each file to add inventory quantities together an then eliminate zero quantity items to create datafiles for distribution.).

To make a long story short... I created PHP programs to parse and upload the raw data into tables in mysql... then another program that does the match, merge, pear-down and output of the data to another final table... which will be their online inventory and a source to output CSV files from.

It works great... it's a little slow... but works great.

HOWEVER... This is going to be hosted on an off site server account (not an internal company server)... and I just found out that due to the fact that the web hosting company allows a maximum of 90,000 mysql transactions per hour... my script will not work (The total rows of all three files I am importing comes to 102,834 records... and with all the importing INSERT, SELECT, UPDATE, and DELETE mysql commands that must take place for each record during the parsing... I am nowhere close (as in multiply by 10 or more)!

When the file finishes processing... it eliminates most of these 102,834 records (due to duplicates and zero qnty items being removed) with a final result file of only about 19,000 items)

SO... I have decided that the only way to make this work with the limitations on the number of Mysql transactions i can have, is to load the data into php arrays and process it there, before sending it to the database.

My only question is... if I invest the time to code such a beast... will I end up being limited by the size an array can be or the time it takes to process the PHP file before it times out... or any other limitations?

I'd hate to code the whole blasted thing an find out it is impossible anyway!

how else might I do this?
suggestions?

    The size of a PHP array is limited by the amount of memory your system can supply. Right now my machine is working on an array that contains >2,000,000 elements.

      Great! Thanks... I think I will be re-writing my code then!

        There is no limit per se on the size of an array - I've run arrays in size over 100 MB without issues. However, since you're working in a shared environment, you'll likely run into issues with memory usage...

        PHP caps memory usage at a system-wide predefined value. The default is 8 MB, which means that code and data cannot occupy more than that amount of memory or your script barfs.

        This is where algorithms come into play... what you want to do can most certainly be done. But, you have to process your information in chunks (typically, line by line) and write immediately to disk to keep memory usage to an absolute minimum.

        For example, you could reduce memory usage by eliminating empty elements before you even consider putting them into an array.

        For example:

        $fp=fopen('outputfile', 'w');
        $i=0;
        $sql="elect * from bignastytable where (some condition)";
        $res=pg_exec($dbconn, $sql);
        while ($row=@pg_fetch_array($res, $i++, PGSQL_ASSOC))
        // this weeds out rows where importantfield is empty
        if ($row[importantfield])

        {
        fwrite($fp, $row[importantfield]);
        }
        fclose($fp);

        Now, without ever using any more memory than required for a single row, we've weeded out all rows where importantfield is empty. Of course, you could perform this trick handily by simply modifying your sql statement, but I think this proves the point.

        Another issue to consider is what is a "transaction"? 90,000 transactions could be considered to be 90,000 database connections, or 90,000 SQL transactions, or 90,000 statements.

        They are all closely related, but aren't the same.

        You can use a single connection as many times as you want in a single web page hit to execute as many SQL statements as you like. You are already familiar with an SQL statement, see the example above.

        But, the last option, an "sql transaction" is a feature of any ACID compliant database (such as PostgreSQL, of which I am fond, and which MySQL is gradually approaching in featureset) wherein you can wrap any number of sql statements into a single transaction.

        Consider:

        insert into tablea (f1, f2, f3) values ($x, $y, $z);
        insert into tableb (x1, x2, x3) values ($a, $b, $c);
        insert into tabler (k1, k2, k3) values ($k, $l, $m);

        What if all these inserts have to do with a single action (updating a user database, for example) but don't hit an error until tabler, and you need tabler or the user database is now invalid.

        What then? SQL transactions to the rescue!

        begin transaction;
        insert into tablea (f1, f2, f3) values ($x, $y, $z);
        insert into tableb (x1, x2, x3) values ($a, $b, $c);
        insert into tabler (k1, k2, k3) values ($k, $l, $m);
        rollback;

        When you call 'rollback', everything done since 'begin transaction' is undone. Completely and cleanly. If you want to keep everything in the transaction, you execute "commit;" instead of "rollback;".

        Depending on how your service provider interprets a "transaction", you may find that all 102,000 inserts fit into a single "transaction", leaving you with 89,999 more to go.

        Lastly, you're doing this to generate a CSV. Why not generate the CSV off-site and then upload periodically via a cronjob or similarly automatic method?

          I don't think I will exceed the 8 mb limit... and I think that I have to be able to load the whole datafile into the array before I can properly parse it.

          (I can't eliminate zero quantity items until after I have placed all items into the array, sorted it and checked to see if any two consecutive lines have the same part number... because it is possible for one file to have zero quantity and another to have some quantity... and only the clients inventory file has all columns of data for the item, the other -- from the vendor -- has only the item number and quantity.)

          If I can load everything into the array, sort it, parse it to merge quantities of matching part numbers and eliminate duplicate rows and zero quantity items... then write it into a database table... it should only require about 17,000 to 19,000 INSERT queries, which would be well within hte 90,000 limit.

          You ask why I don't just generate the CSV offline... well, because the company I am doing this for needs "SIMPLE"... they have no clue how to do anything with data files and it will be a challenge to simply show them how to take the vendors files and the auto-generated export of their quickbooks inventory and upload them into this script!

          If I had to train them how to generate the CSV offline... well... let's just say no level of training would be able to do that for these people.

          eventually I might set up some kind of automated chronjob transfer... but for now... I just want to get it working.

          The data I am inserting into the mysql database will be used to generate several CSV output files in different formats... and also be used as the online inventory search for their website.... so it has to go online like that anyway.

          Thanks for your advice!

            Ok... here I am again...

            I rewrote the code to handle all the data manipulation in a php array rather than using multiple queries to do it. It works great with small size test files, but when I try to us my actual files...

            the time it take to run the script times out the browser after about 1 minute 30 seconds and I get a "the attempt to load: Accessing (url) failed " error dialog box.

            What I don't understand is that when I was processing a form to this page and it was using the database access to process... it would run for 8 minutes or more waiting for the page to come back. But now, when I process the form to this page and it is doing all the work in a php array... it craps out after a minute and a half!

            is there something I can do to force the browser to be patient and wait for the server to send the page?

              I'm assuming that you've gotten past set_time_limit() issues.

              I've seen this as a problem - the browser will timeout if the webserver doesn't respond with something.

              Compounding this is the fact that the webserver caches results, and if the cache isn't filled in 30 secs or so, the browser may time out even quicker.

              what I've found that works is to do something like this:

              <h3> Processing form data</H3>
              <?
              set_time_limit(0);
              while ($row=get_row_from_database())
              {
              echo "Processing ".$row[title]."<BR>";
              // do whatever...
              }
              ?>
              <H3> Processing complete.</H3>
              <body bgcolor=AAEEAA>

              Just tell them that when the background turns green, it's done. They can then scroll down the list and see what the server is up to as it processes, and then the bg color change is a direct and obvious notification that it's "done".

                Thanks,

                I think that will do it. I am a newbie and was unaware of set_time_limit(0).

                Now, if I place a set_time_limit(0) at the beginning of the page will will run indefinatly?

                ...or should I put a set_time_limit(0); at the beginning of each major section of code in the page that could potentially take a long time to run?

                or is that overkill?

                ;-)

                  Originally posted by mcrbids
                  what I've found that works is to do something like this:

                  If you want to get clever with DHTML (possible cross-browser issues), have - er - DHTML...
                  (Warning; I'm making this up without testing it.)

                  <script>
                  function uc(i)
                  {
                  //Assumes >=IE5.5 or Gecko
                  document.getElementById('recordcount').innerHTML=i;
                  }
                  </script>
                  Processing record: <span id='recordcount'></span>
                  

                  and for each row spit out

                  <script>uc('<?php echo $row['title']?>');</script>
                  

                  And at the end have

                  <script>uc('<b>Finished processing</b>');</script>

                  and carry on.

                  For set_time_limit(0): you don't need to set it more than once, so put it at the top of the page.

                  Alternatively, reset it at the start of each loop iteration to something that would make sense for that one iteration; something like

                  for(blahblahblah)
                  {
                  set_time_limit(5);
                  ...
                  }

                    Write a Reply...