I am writing a daemon which must keep track of some data in an array. As the daemon runs (possibly indefinitely), items must be removed from and added to this array. When I remove them, I use unset($arr[$key]);

I was originally just assigning values to the array using $arr[] = new Thing() but I realized that, although unlikely, it was distinctly possible that the array key generated by that assignment might some day exceed PHP_INT_MAX because that code doesn't bother looking for holes in your key sequence left by unset calls, it just checks the maximum numeric index and increments it by one. I therefore wrote the following routine so that I could be sure that I would always grab the lowest available integer key. I now worry that it might be SUPER slow should my array become large:

function get_lowest_available_key($arr) {
	$key = 0;
	while(array_key_exists($key, $arr)) {
		$key++;
		if (defined('PHP_INT_MAX') && ($key > PHP_INT_MAX)) {
			throw new Exception('Function get_lowest_available_key failed.  Key value exceeded PHP_INT_MAX (' . PHP_INT_MAX . ')');
			return FALSE;
		}
		if (!is_int($key)) {
			throw new Exception('Function get_lowest_available_key failed.  Key value is not an integer');
			return FALSE;
		}
		if ($key < 0) {
			throw new Exception('Function get_lowest_available_key failed.  Key value is negative');
			return FALSE;
		}
	}
	return $key;
}

Can anyone recommend a better solution?

    I need to preserve the array keys for a given array element. I.e., array[6] must always point to the same client object until that client disconnects.

      I don't know of any built-in way, so all I can think of is something clunky like:

      for($ix = 0; $ix <= PHP_INT_MAX; $ix++)
      {
         if(!isset($array[$ix]))
         {
            $array[$ix] = $value;
            break;
         }
      }
      

        That's pretty much what my function is doing, except I'm using [man]array_key_exists[/man] instead of [man]isset[/man] because isset returns false even if there is an array element with value null...or something like that. I forget.

        Thanks also for your suggestion. I guess I was imagining that there might be some tricky way to improve performance for large arrays. Maybe I should check the array [man]count[/man] first? Of course, the array could contain only index zero and index PHP_INT_MAX -1 which would yield a count of 2. 🙁

          $range = range(0,max(array_keys($array)));
          if(count($range) != count($array))
              $hole = min(array_diff($range, array_keys($array)));
          else
              $hole = count($array);
          if($hole <0) trigger_error('Array full! Bork bork bork!', E_USER_ERROR);
          

          But a couple of billion elements? How often are new ones created? If it's a hundred per second and none are ever eliminated (making this problem irrelevant, but eliminations would allow it to run longer still) you'd blow the limit in eight months.

          And it does nothing to handle the situation when the array is full, which has to be considered a plausible situation if this whole thing is an issue at all.

          You might want to partition these things a bit more finely than just one whacking big array for them all.

            Is a null value something you still have to worry about? If not, I believe isset() will be a bit faster than array_key_exists() (I think I read that somewhere, but I've not tested it myself).

            Maybe another approach would be to use an associative array instead of numeric, and use microtime() or uniqid() to create the key.

            $key = uniqid('', true);
            $array[$key] = 'foo bar';
            

              isset would be faster, since array_key_exists is a function call rather than a language construct.

              NogDog wrote:

              Maybe another approach would be to use an associative array instead of numeric, and use microtime() or uniqid() to create the key.

              I'd plump for microtime(), there. microtime(false) in fact, and use two nested arrays: $big_array[$microtime[0]][$microtime[1]]. Two integer keys would be physically smaller than one 21-character string, and the lookup would likely be faster since it wouldn't require the string comparisons (though I haven't tested any of this).

              I'm going to see what happens if a multi-billion array is created. It might blow up anyway however it's keyed.

                isset would be faster, since array_key_exists is a function call rather than a language construct.

                NogDog wrote:

                Maybe another approach would be to use an associative array instead of numeric, and use microtime() or uniqid() to create the key.

                I'd plump for microtime(), there. microtime(false) in fact, and use two nested arrays: $big_array[$microtime[0]][$microtime[1]]. Two integer keys would be physically smaller than one 21-character string, and the lookup would likely be faster since it wouldn't require the string comparisons (though I haven't tested any of this).

                I'm going to see what happens if a multi-billion array is created. It might blow up anyway however it's keyed.

                  Well, I tried it; it would appear that even with "unlimited" memory available to it, PHP's memory management will crap out at 2GB.

                    Thanks for the great feedback. I think associative arrays might work. Off the top of my head, I think we always iterate through these arrays using foreach. I don't think I need to worry about any null values -- i just used array_key_exists because it seemed more correct.

                    It's highly unlikely that these arrays will ever have large numbers of values in them. The problem is rather that as clients connect and disconnect, we gradually chew up the key space unless we recycle keys somehow -- or come up with some scheme like you guys are talking about.

                    E.g., clients connect and disconnect for six months while this daemon runs (I have had a server up for 425 days before--should have been longer but some dope restarted it) and we take a snapshot and you have 100 connected clients (i.e., count=100) but their indices span a range from 1 to 100000000).

                    I do consider the likelihood of burning through all of the key space remote. If we ate one integer per second, it would be the average human lifespan before we got to PHP_INT_MAX. I just didn't want to be responsible for some code that would mysteriously die one day because I didn't take this into account. Likewise, I want it to be fast. As a practical matter, I think it is nearly impossible that I would ever have more than 1,000 elements at a time in this array. It's almost certainly going to be less -- say 100 or so.

                    Weedpacket, do I understand your suggestion correctly in thinking that a given array element could obtain a key by calling microtime(TRUE) and then splitting along the dot, and then use the two halves to index into a 2-d array. something like this?

                    $keys = explode('.', microtime(TRUE));
                    $arr[$keys[0]][$keys[1]] = $myObj;
                    

                      Almost, but multiply the first part by 1000000 and cast both to integers.

                      sneakyimp wrote:

                      I think it is nearly impossible that I would ever have more than 1,000 elements at a time in this array. It's almost certainly going to be less -- say 100 or so

                      Okay, so the long running times and the very remote possibility of a huge array aren't really an issue - only the need for the key to remain constant for the lifetime of its value, and for all currently-used keys to be unique (when I write that last requirement I feel silly).

                        Yes, the possibility of a huge array is basically zero...unless there's some kind of DOS thing...it would be nice to handle that contingency gracefully.

                        Smallish number of elements (1000 max under normal conditions)
                        array keys must be preserved for the lifetime of a connected client
                        and yes of course they must be unique.

                          This doesn't solve any problem, I was just curious to see what happens:

                          <pre><?php
                          $test = array();
                          $test[0] = 'one';
                          $test[1] = 'two';
                          $test[PHP_INT_MAX] = 'many';
                          print_r($test);
                          $test[] = 'many-one';
                          $test[] = 'many-two';
                          print_r($test);
                          $test[-1] = 'lots';
                          $test[] = 'fubar';
                          ?></pre>
                          

                          Result:

                          Array
                          (
                              [0] => one
                              [1] => two
                              [2147483647] => many
                          )
                          Array
                          (
                              [0] => one
                              [1] => two
                              [2147483647] => many
                              [-2147483648] => many-one
                              [-2147483647] => many-two
                          )
                          
                          
                          Warning:  Cannot add element to the array as the next element is already occupied in C:\wamp\www\test\test.php on line 11
                          
                            11 days later

                            Alrighty I broke down and did some performance comparisons using the attached script which mimics the operation of my application's main loop for 10,000 iterations assuming 100 clients are consistently connected. Here are all the key getting methods:

                            function method1_get_key($arr) {
                            	$key = 0;
                            	while(array_key_exists($key, $arr)) {
                            		$key++;
                            		if (defined('PHP_INT_MAX') && ($key > PHP_INT_MAX)) {
                            			throw new Exception('Function get_lowest_available_key failed.  Key value exceeded PHP_INT_MAX (' . PHP_INT_MAX . ')');
                            			return FALSE;
                            		}
                            		if (!is_int($key)) {
                            			throw new Exception('Function get_lowest_available_key failed.  Key value is not an integer');
                            			return FALSE;
                            		}
                            		if ($key < 0) {
                            			throw new Exception('Function get_lowest_available_key failed.  Key value is negative');
                            			return FALSE;
                            		}
                            	}
                            	return $key;
                            }
                            
                            function method2_get_key($arr) {
                            	$key = 0;
                            	while(isset($arr[$key])) {
                            		$key++;
                            		if (defined('PHP_INT_MAX') && ($key > PHP_INT_MAX)) {
                            			throw new Exception('Function get_lowest_available_key failed.  Key value exceeded PHP_INT_MAX (' . PHP_INT_MAX . ')');
                            			return FALSE;
                            		}
                            		if (!is_int($key)) {
                            			throw new Exception('Function get_lowest_available_key failed.  Key value is not an integer');
                            			return FALSE;
                            		}
                            		if ($key < 0) {
                            			throw new Exception('Function get_lowest_available_key failed.  Key value is negative');
                            			return FALSE;
                            		}
                            	}
                            	return $key;
                            }
                            
                            function method3_get_key($arr) {
                            	if (count($arr) == 0) return 0;
                            	$range = range(0,max(array_keys($arr)));
                            	if(count($range) != count($arr))
                            		$hole = min(array_diff($range, array_keys($arr)));
                            	else
                            		$hole = count($arr);
                            	if($hole <0) trigger_error('Array full! Bork bork bork!', E_USER_ERROR);
                            	return $hole;
                            }
                            function method4_get_key($arr) {
                            	// this is kind of cheating because we cannot guarantee uniqueness
                            	// interestingly, it's not the fastest
                            	return uniqid();
                            }
                            function method5_get_key($arr) {
                            	// this is also kind of cheating because we cannot guarantee uniqueness
                            	return rand();
                            }
                            function method6_get_key($arr) {
                            	do {
                            		$k = uniqid();
                            	} while(isset($arr[$k]));
                            	return $k;
                            }
                            function method7_get_key($arr) {
                            	do {
                            		$k = rand();
                            	} while(isset($arr[$k]));
                            	return $k;
                            }
                            function method8_get_key($arr) {
                            	do {
                            		$k = md5(rand());
                            	} while(isset($arr[$k]));
                            	return $k;
                            }
                            function method9_get_key($arr) {
                            	do {
                            		$k = uniqid(rand(), true);
                            	} while(isset($arr[$k]));
                            	return $k;
                            }
                            function method10_get_key($arr) {
                            	do {
                            		$k = md5(uniqid(rand(), true));
                            	} while(isset($arr[$k]));
                            	return $k;
                            }
                            function method11_get_key($arr) {
                            	$keys = array_keys($arr);
                            	do {
                            		$k = uniqid();
                            	} while(in_array($k, $keys));
                            	return $k;
                            }
                            // microtime approach similar to what weedpacket
                            // suggested, only without nested arrays
                            function method12_get_key($arr) {
                            	$keys = array_keys($arr);
                            	do {
                            		$mt = microtime();
                            		$karr = explode(' ', $mt);
                            		$k = intval(100000000 * $karr[0]);
                            	} while (in_array($k, $keys));
                            	return $k;
                            }
                            

                            Of the methods that guarantee a new key rather than just returning a supposedly unique number, the grand champeen is method #7. method #8 is almost identical in performance should you want a different format of key.

                            As you guys said, isset is indeed faster. I made a half-assed effort to find a distinction between isset and in_array but there didn't seem to be much of one. Maybe I check that next if I'm feeling motivated.

                            Here's an APD table of the output if I exclude the slow ones:

                            My-Mac:apd sneakyimp$ ./pprofp -iR pprof.71840.0
                            
                            Trace for /Users/sneakyimp/Desktop/test.php
                            Total Elapsed Time = 408.83
                            Total System Time  = 67.66
                            Total User Time    = 329.39
                            
                            
                                 Real         User        System             secs/    cumm
                            %Time (excl/cumm)  (excl/cumm)  (excl/cumm) Calls    call    s/call  Memory Usage Name
                            --------------------------------------------------------------------------------------
                            100.0 389.90 408.83  315.00 329.39  64.74 67.66     1  389.9010   408.8264            0 main
                            1.0 3.89 3.89  3.23 3.23  0.52 0.52  100200  0.0000   0.0000            0 method10_get_key
                            0.7 2.89 2.89  2.46 2.46  0.40 0.40  100200  0.0000   0.0000            0 method9_get_key
                            0.7 2.87 2.87  1.76 1.76  0.54 0.54  100200  0.0000   0.0000            0 method6_get_key
                            0.7 2.83 2.83  2.32 2.32  0.39 0.39  100200  0.0000   0.0000            0 method8_get_key
                            0.7 2.80 2.80  1.66 1.66  0.54 0.54  100200  0.0000   0.0000            0 method4_get_key
                            0.5 1.85 1.85  1.49 1.49  0.27 0.27  100200  0.0000   0.0000            0 method7_get_key
                            0.4 1.80 1.80  1.46 1.46  0.27 0.27  100200  0.0000   0.0000            0 method5_get_key
                            0.0 0.00 0.00  0.00 0.00  0.00 0.00     1  0.0014   0.0014            0 apd_set_pprof_trace
                            

                            The script and some performance graphs are attached.

                              How well does my suggestion in post #7 run? I'm curious, because it works with entire arrays and I expect some time/space tradeoff.

                              But I still reckon your code (and mine) is unnecessarily armour-plated: I fully expect you'll run out of memory anyway before you hit any of those error conditions.

                                I implemented that suggestion from post #7 as method 3. Let me know if I got it right:

                                function method3_get_key($arr) {
                                    if (count($arr) == 0) return 0;
                                    $range = range(0,max(array_keys($arr)));
                                    if(count($range) != count($arr))
                                        $hole = min(array_diff($range, array_keys($arr)));
                                    else
                                        $hole = count($arr);
                                    if($hole <0) trigger_error('Array full! Bork bork bork!', E_USER_ERROR);
                                    return $hole;
                                } 
                                

                                Unfortunately, it was the slowest by far. It's represented in yellow in this chart.

                                I agree that we've all probably gone overkill here. My concern is not so much memory as speed. Keep in mind that the actual number of elements in the array is likely to be quite small (less than 1000). As the graphs show, performance will slow down dramatically once you get a few hundred clients connected. The basic idea is to act as a switching station or hub for multiplayer games. Once pings drop below 200 milliseconds, it's almost useless for any kind of fast-twitch gaming.

                                At any rate, I appreciate the input. I'm thinking method 7 works quite well and seems fast even as we get into several hundred clients. I expect it'll be the other code that slows things down with any traffic.

                                  Suggest you use mt_rand() instead of rand() or you'll get shocking performance on Windows machines (if that matters).

                                  I'd probably do the same thing (pick random numbers until an unused one turns up) now that you've said that it's not likely that the array would get full (that sort of constraint does make a difference).

                                  Here's a slight variation that limits the number of random number generations per call to two:

                                  function method1_get_key($array)
                                  {
                                  	$k = mt_rand();
                                  	if(!isset($array[$k])) return $k;
                                  	$step = mt_rand();
                                  	do
                                  	{
                                  		$k += $step;
                                  	} while(isset($array[$k]));
                                  	return $k;
                                  }

                                  But the difference between this and method 7 doesn't really look like anything until there are about a million array elements.

                                    Write a Reply...