Hi there everyone!

I've written a search system on my site that utilizes keywords and weights to determine best matches. Unfortunately, I didn't write this until I had 1,000 items in the database already so I had to write a function to (re)build the searchpool of keywords and weights. This script goes through each ID in the item database and creates strings and weights for things like title, description, keywords, etc.

the script uses and updating progressbar that I'm using to monitor it's progress. When the script starts running, it's blazing fast. By about 100 into it, I notice that the performance has been probably cut in half. By 500 in, it's slower still and it just continues worsening as it continues. By record 1000, it's probably doing a record every three seconds.

I monitored the server load as the script ran and it correlates to the performance degradation of the script itself. The first few hundred didn't seem to impact the server at all and by the end, the numbers were quite high.

Since the script starts out running great and only slows as it progresses in a very uniform escalation, I'm wondering if it's the dynamic page / progress bar itself that's causing my issue. It utilizes php's flush to perform it's magic and I'm wondering if this might be causing the problem.

The page generation stuff:

echo'
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html lang="en">
	<head>

<style>
	body {
		background: #f06d06;
		font-size: 200%;
		padding: 20px;
	}
	.className{
		width:500px;
		height:150px;
		position:absolute;
		left:50%;
		top:50%;
		margin:-75px 0 0 -250px;
	}

</style>
</head>
<body>
	<div class="className">
		<!-- Progress bar holder -->
		<div id="progress" style="width:500px;border:1px solid #ccc;"></div>
		<!-- Progress information -->
		<div id="information" style="width:100%; padding-top:20px; text-align:center"></div>
	</div>

';

/* Time to do some actual work. */

/* Wipe out current searchpool data for this $ciid. */
$q_dsp = "DELETE FROM searchpool";
$r_dsp = mysqli_query ($link, $q_dsp) or die('Catastrophic failure [Super secret code 7219229]');

$i = 1;
$q_prim = "SELECT * FROM cart_items ORDER BY id ASC";
$r_prim = mysqli_query ($link, $q_prim) or die('Catastrophic failure [Super secret code 48155]');
$n_prim = mysqli_num_rows($r_prim);
$total = $n_prim;
while ($row_prim = mysqli_fetch_assoc ($r_prim)) {

	$ciid = $row_prim['id'];
	$title = $row_prim['title'];
	$mpn = $row_prim['mpn'];
	$description = $row_prim['description'];

             /* Lots of stuff happening here.  See below for actual searchpool stuff */

	$percent = intval($i/$total * 100)."%";
	$i = ++$i;
	// Javascript for updating the progress bar and information
	echo '<script language="javascript">
	document.getElementById("progress").innerHTML="<div style=\"width:'.$percent.';background-color:#ddd;\">&nbsp;</div>";
	document.getElementById("information").innerHTML="'.$i.' records processed.";
	</script>';

	// This is for the buffer achieve the minimum size in order to flush data
	echo str_repeat(' ',1024*64);

	// Send output to browser immediately
	flush();

	// Sleep one second so we can see the delay
	//sleep(1);

}

// Tell user that the process is completed
echo '
<script language="javascript">document.getElementById("information").innerHTML="Process completed"</script>
';
	sleep(3);
	echo'
	<script>setTimeout("top.location = \'index.php\'", 2000);</script>
	<meta http-equiv="refresh" content="0; url=/'.$vu_overpath.'?do=search-rebuild&alldone">
	';
	exit();

And this is what the script is doing for each item ID: http://pastebin.com/AebZZEKq

Is this a terrible way to handle long running scripts that have a bunch to do? My next thought is to run the script with simple echoing of an asterisk to show it's progress, stripping all the progress bar/ php flush stuff out of the page to see if it performs better but I wanted to ask if anyone spotted a bad problem area before tearing it apart.

Any suggestions on resolving the issue would be greatly appreciated. Thanks for your time!

    Caveat: I did a lot of skimming, so this is neither comprehensive nor necessarily correct. 🙂

    It looks to me like you keep appending to a number of arrays as you read query result rows. PHP arrays are somewhat notorious as memory hogs to begin with, and if you are retrieving a lot of rows of data into them, even if they were more efficient, they'd still be using up RAM. In your place, I might be looking for a more database-driven approach, perhaps creating temporary tables from your assorted select queries, then using those temporary tables to do whatever it is you currently do with those arrays?

      At first glance, it seems like there's a lot you're doing there that you could just do in the database instead of pulling it out of the database, doing the work in PHP, and then putting the results into the database (one record at a time, and committing after every single operation). I could be wrong, because that's just at a first glance; a zeroth glance had me wondering why the work to reproduce full-text searching in the first place.

      A second glance. The MySQLi interface does have a [man]mysqli-result.fetch-all[/man] method.
      And

             while ($row_specs = mysqli_fetch_assoc ($r_specs)) {
                  $spec_array[]['name'] = htmlentities($row_specs['name'], ENT_QUOTES);
                  $spec_array[]['safename'] = mysqli_real_escape_string($link, $row_specs['name']);
                  $spec_array[]['value'] = htmlentities($row_specs['value'], ENT_QUOTES);
                  $spec_array[]['safevalue'] = mysqli_real_escape_string($link, $row_specs['value']);
                  $spec_array[]['full'] = $row_specs['full'];
                  $spec_array[]['brief'] = $row_specs['brief'];
                  $spec_array[]['sort'] = $row_specs['sort'];
              }

      Probably doesn't do what you want (hint, if n rows are returned, [font=monospace]$spec_array[/font] will have 7n elements).

        One thing to remember, don't do "select * from" unless you really need all fields returned. If you're only using "id", "title", "mpn" and "description", only ask for those. I used to be skeptical about this until I looked at "show processlist" in SQL while a particularly not-so-fast script of mine was running. "Sending data..." was there a LOT --- it takes time for the DB server to send data; plus, the less RAM your SQL server uses the more you might have available for other things?

        Next, do you really need ORDER BY in all cases? (You might, I didn't analyze exactly how you're doing all this).

        Next thing to think about, try to use JOIN. I'm fairly certain the queries on line 4 & 12 of your pastebin code, as an example, might be more performant if JOINed together (there might be a tad of parse/conditional PHP logic in the resulting lines of assignment code, but it would be worth it, my gut says).

        Nearly done, you're sleep()-ing a lot. Is that necessary? What about usleep() with a lower value? 3 seconds, in particular, is "FOREVER" (although I've got quite a few scripts where that doesn't even come close to what I've had to set with set_time_limit() 😉 #no_one_is_perfect_especially_me).

        Finally, do you have to have a progress bar? Our site search rebuilds its index once every twenty-four hours, and does a "delta" every 20 minutes to catch anything that's new ... all via cron, so no progress bar / browser is needed. It could be that another approach would work for this task?

          Oh, one more, maybe thing: if, at some point, you're finished with an array (will not use it again) but the script still has work to do, unset() it.

            ...And I also just remembered that that pastebinned stuff is inside a loop. So if n elements are returned for the loop in the piece I posted, [font=monospace]$spec_array[/font] will contain 7n elements after that loop during the first iteration of the outer loop. As for later iterations .... you planning on cleaning it out any time soon?

              Hi there guys and thanks so much for the fantastic input! I'll begin working on the function in the morning but there's a couple questions I wanted to ask before I get started.

              First, in regard to not unsetting the arrays, I thought(and I'm likely wrong) that in the loop where I state:

              $keyword_array = array();

              That, in effect, unsets the previous data, since I'm declaring it as an empty array. Is this not the case? Do I need to UNSET it prior to or instead of this?


              Sleep: I think I'm only sleeping once(it's outside the loop), for three seconds to allow the user to see the completed message. The 1 sec sleep is commented out. I was using it for troubleshooting purposes earlier.


              Progress bar shenanigans: It's less necessity and more of a learning experience for me as I had never done it. I mentioned above but to reiterate, I think I'm going to strip it out and replace it with a simple echoing of an asterisk with visual queues for 10% increments.


              I will be sure to change my queries to strip order where it's not needed and only ask for what I need. I'll report back with my next version. I'm not intentionally glossing over any suggestions, it's just that I've done so much that's wrong, I'm going to have to tackle the issues one chunk at a time.

              Thank you all again for the fantastic guidance. I really appreciate it!

                schwim2;11060631 wrote:

                First, in regard to not unsetting the arrays, I thought(and I'm likely wrong) that in the loop where I state:

                $keyword_array = array();

                That, in effect, unsets the previous data, since I'm declaring it as an empty array. Is this not the case? Do I need to UNSET it prior to or instead of this?

                No; you're correct in your understanding there. I didn't look at your names closely, but I will now.

                One such place I think is around line 98, your comments say "mpn ends here" and there's no usage of $mpn_array after this point, if I'm reading right. But the script goes on. You say you're watching it eat RAM; if you unset($mpn_array) around line 99 where you're done with it, PHP might clean up enough memory that you'd use less RAM during the rest of the script, is my thought. I am not certain I'm correct, but it's something that I would try with a less-than-performant piece of code. YMMV, of course... 🙂

                  Hi there guys,

                  I'm back with a revised version. Thanks to everyone's guidance, I made the following changes:

                  First, I UNSET everything I could right after I was done with it in that loop.

                  Secondly, I got rid of almost all the selects and inserts and did get rid of all the updates by creating one master searchpool array that each section builds upon. With this new setup, there's only one searchpool insert loop for each item ID.

                  Finally, I altered the queries to only do what's required. I dropped orders where it didn't matter and I only asked for elements that I needed.

                  Before I show the code, I ran this and it completed the entire process in 17 seconds. This is in comparison to the 4:30+ time of yesterday's version. Also of note is that there was no degradation in performance between the start and the end. It was running as quickly at the end as it was when it started. In spite of this ridonkulous increase in performance, I'd greatly appreciate any suggestions in further improvements if anyone spots anything glaring.

                  	/* Time to do some actual work. */
                  
                  /* Wipe out current searchpool data. */
                  $q_dsp = "DELETE FROM searchpool";
                  $r_dsp = mysqli_query ($link, $q_dsp) or die('Catastrophic failure [Super secret code 7219229]');
                  
                  $i = 1;
                  $q_prim = "SELECT id, title, mpn, description FROM cart_items";
                  $r_prim = mysqli_query ($link, $q_prim) or die('Catastrophic failure [Super secret code 48155]');
                  $n_prim = mysqli_num_rows($r_prim);
                  $total = $n_prim;
                  while ($row_prim = mysqli_fetch_assoc ($r_prim)) {
                  
                  	$ciid = $row_prim['id'];
                  	$title = $row_prim['title'];
                  	$mpn = $row_prim['mpn'];
                  	$description = $row_prim['description'];
                  
                  	/* Build the MPN array */
                  	/* A weight of 999999 is an MPN */
                  	$mpn_array[0] = $mpn;
                  	$q_mpn = "SELECT mpn FROM mpns WHERE ciid = '$ciid'";
                  	$r_mpn = mysqli_query($link, $q_mpn) or die('Catastrophic failure [Super secret code 715495]');
                  	while ($row_mpn = mysqli_fetch_assoc ($r_mpn)) {
                  		$mpn_array[] = $row_mpn['mpn'];
                  	}
                  
                  	/* Build the keywords array */
                  	$keyword_array = array();
                  	$q_kw = "SELECT * FROM keywords WHERE ciid = '$ciid'";
                  	$r_kw = mysqli_query($link, $q_kw) or die('Catastrophic failure [Super secret code 384955]');
                  	while ($row_kw = mysqli_fetch_assoc ($r_kw)) {
                  		$keyword_array[] = $row_kw['string'];
                  	}
                  
                  	/* Additional specs are needed */
                  	$q_specs = "SELECT name, value FROM specs WHERE ciid = '$ciid' ORDER BY id ASC";
                  	$r_specs = mysqli_query($link, $q_specs) or die('Catastrophic failure [Super secret code 954219]');
                  	while ($row_specs = mysqli_fetch_assoc ($r_specs)) {
                  		$spec_array[]['name'] = $row_specs['name'];
                  		$spec_array[]['value'] = $row_specs['value'];
                  	}
                  
                  	/**************************************/
                  	/**        Search Pool Start         **/
                  	/**************************************/
                  
                  	$sp_array = array();
                  
                  	/* Title Start */
                  
                  	$sp_title = stripStopWords($title);
                  	$exploded_title = explode(" ", strtolower($sp_title));
                  	foreach ($exploded_title as $string){
                  
                  		$string = leavealphanum(trim($string));
                  
                  		if($string != ''){
                  			if (!in_array(strtolower($string), $sp_array)) {
                  				$sp_array[$string] = 20;
                  			}else{
                  				$sp_array[$string] = $sp_array[$string] + 20;
                  			}
                  		}
                  	}
                  	UNSET($exploded_title);
                  
                  	/* Title End */
                  
                  	/* MPNs Start */
                  
                  	/* Loop through the MPN array. */
                  	FOREACH($mpn_array AS $string){
                  
                  		if (!in_array($string, $sp_array) AND $string != '') {
                  			$sp_array[$string] = 999999;
                  		}else{
                  			$sp_array[$string] = $sp_array[$string] + 999999;
                  		}
                  
                  		$string2 = leavealnumspacers($string);
                  		if (!in_array($string2, $sp_array) AND $string2 != '') {
                  			$sp_array[$string2] = 999999;
                  		}
                  
                  		$string3 = leavealphanum($string);
                  		if (!in_array($string3, $sp_array) AND $string3 != '') {
                  			$sp_array[$string] = 999999;
                  		}
                  
                  	}
                  	UNSET($mpn_array);
                  
                  	/* MPNs End */
                  
                  	/* Description Start */
                  
                  	$sp_desc = stripStopWords(strtolower(strip_tags($description)));
                  	$desc_exploded = explode(" ", leavealnumspacers(strtolower($sp_desc)));
                  	FOREACH($desc_exploded AS $string){
                  
                  		$string = trim($string);
                  
                  		if($string != ''){
                  			IF(!array_key_exists ($string, $sp_array)) {
                  				$sp_array[$string] = 1;
                  			}else{
                  				$sp_array[$string] = $sp_array[$string] + 1;
                  			}
                  		}
                  	}
                  	UNSET($desc_exploded);
                  
                  	/* Description End */
                  
                  	/* Keywords Start */
                  
                  	IF(!EMPTY($keyword_array)){
                  		FOREACH($keyword_array AS $string){
                  
                  			$string = trim(strtolower($string));
                  
                  			if($string != ''){
                  				if (!array_key_exists ($string, $sp_array)) {
                  					$sp_array[$string] = 20;
                  				}else{
                  					$sp_array[$string] = $sp_array[$string] + 20;
                  				}
                  			}
                  		}
                  	}
                  	UNSET($keyword_array);
                  
                  	/* Keywords End */
                  
                  	/* Additional Specs Start */
                  
                  	$spec_strings = array();
                  	$spec_string = '';
                  	IF(ISSET($spec_array)){
                  
                  		FOREACH($spec_array AS $spec_entry){
                  
                  			$spec_string .= $spec_entry['name'] .' '. $spec_entry['value'].' ';
                  
                  		}
                  
                  		$spec_exploded = explode(" ", leavealnumspacers(strtolower($spec_string)));
                  		FOREACH($spec_exploded AS $string){
                  			if($string != ''){
                  
                  				$string = trim($string);
                  
                  				if (!array_key_exists ($string, $sp_array)) {
                  					$sp_array[$string] = 1;
                  				}else{
                  					$sp_array[$string] = $sp_array[$string] + 1;
                  				}
                  			}
                  		}
                  		UNSET($spec_exploded);
                  
                  	}
                  
                  	/* Additional Specs End */
                  
                  	/* We have created the searchpool array.  Insert it into the database. */
                  	FOREACH($sp_array AS $sp_key => $sp_value){
                  
                  		$sp_key = mysqli_real_escape_string($link, $sp_key);
                  		$sp_val = mysqli_real_escape_string($link, $sp_val);
                  
                  		$q_spin = "INSERT INTO searchpool (ciid, string, weight) VALUES ('$ciid', '$sp_key', '$sp_val')";
                  		$r_psin = mysqli_query($link, $q_spin) or die('Catastrophic failure [Super secret code 95412]');
                  	}
                  
                  	/**************************************/
                  	/**         Search Pool End          **/
                  	/**************************************/
                    schwim2;11060643 wrote:

                    ridonkulous increase in performance

                    That's good enough for me; just make sure it's doing all the work that it's supposed to. I have, at times in the past, actually chucked stuff to increase performance and then got a few days down the road and someone said, "what happened to the data for report $x? We've not had anything since the 16th ..." :eek:

                    Fortunately, automated backups 'n stuff 😉 😉

                             while ($row_specs = mysqli_fetch_assoc ($r_specs)) {
                                  $spec_array[]['name'] = $row_specs['name'];
                                  $spec_array[]['value'] = $row_specs['value'];
                              }
                      

                      This is probably still not what you want, unless you want the name and value to be in different elements of [font=monospace]$spec_array[/font].

                        Fixed(I think):

                        		$ii = 0;
                        		while ($row_specs = mysqli_fetch_assoc ($r_specs)) {
                        			$spec_array[$ii]['name'] = $row_specs['name'];
                        			$spec_array[$ii]['value'] = $row_specs['value'];
                        			$ii = ++$ii;
                        		}
                          Weedpacket;11060707 wrote:
                                 while ($row_specs = mysqli_fetch_assoc ($r_specs)) {
                                      $spec_array[]['name'] = $row_specs['name'];
                                      $spec_array[]['value'] = $row_specs['value'];
                                  }
                          

                          This is probably still not what you want, unless you want the name and value to be in different elements of [font=monospace]$spec_array[/font].

                          Hmm. Given that you've mentioned this twice, do we need to spell it out more clearly?

                          $spec_array[$introduce_var_here]['value'] = $row_specs['value'];

                          😉

                          I may have just broken Weedpacket's give-a-fish principle ... :eek:

                          [EDIT]: I see you beat me to it. 'Tis what I get for opening up every post in tabs and getting to this one quite later ... but, at least you caught your own fish!! 😃 🙂 😉[/EDIT]

                            schwim2;11060715 wrote:

                            Fixed(I think):

                            		$ii = 0;
                            		while ($row_specs = mysqli_fetch_assoc ($r_specs)) {
                            			$spec_array[$ii]['name'] = $row_specs['name'];
                            			$spec_array[$ii]['value'] = $row_specs['value'];
                            			$ii = ++$ii;
                            		}

                            Why not just $spec_array[] = $row_specs; ?? Why do you need to separately assign each piece?

                              Derokorian;11060729 wrote:

                              Why not just $spec_array[] = $row_specs; ?? Why do you need to separately assign each piece?

                              That's a good point, assuming you don't have any extraneous info in $row_specs (and if he took our advice and didn't ask for any extra fields, that should be true), and you don't want to change any names (it appears he doesn't). Good catch.

                              As far as my apps go, a lot of times I'm working with some "redonkulous" names and I want something shorter for my array:

                              $c = 0;
                              while ($row_specs = mysqli_fetch_assoc ($r_specs)) {
                                    $spec_array[$c]['name']  = $row_specs['really_long_name'];
                                    $spec_array[$c]['value'] = $row_specs['a_specific_value'];
                                    $c++;
                              }

                              I guess I don't think much of the DB design choices/naming conventions made by some people in the past ...

                                Derokorian;11060729 wrote:

                                Why not just $spec_array[] = $row_specs; ?? Why do you need to separately assign each piece?

                                dalecosp;11060731 wrote:

                                That's a good point, assuming you don't have any extraneous info in $row_specs (and if he took our advice and didn't ask for any extra fields, that should be true), and you don't want to change any names (it appears he doesn't). Good catch

                                It was working in it's previous form derokorian but I could tell what Mr. Packet wanted from me and hate to disappoint 🙂

                                  dalecosp;11060731 wrote:

                                  As far as my apps go, a lot of times I'm working with some "redonkulous" names and I want something shorter for my array:

                                  $c = 0;
                                  while ($row_specs = mysqli_fetch_assoc ($r_specs)) {
                                        $spec_array[$c]['name']  = $row_specs['really_long_name'];
                                        $spec_array[$c]['value'] = $row_specs['a_specific_value'];
                                        $c++;
                                  }

                                  I guess I don't think much of the DB design choices/naming conventions made by some people in the past ...

                                  Howsabout

                                  while ($row_specs = mysqli_fetch_assoc ($r_specs)) {
                                        $spec_array[] = [
                                              'name'  => $row_specs['really_long_name'],
                                              'value' => $row_specs['a_specific_value'],
                                        ];
                                  }

                                  Or of course alias the columns in the original query, if you can get at that.

                                    Weedpacket;11060781 wrote:

                                    Howsabout

                                    while ($row_specs = mysqli_fetch_assoc ($r_specs)) {
                                          $spec_array[] = [
                                                'name'  => $row_specs['really_long_name'],
                                                'value' => $row_specs['a_specific_value'],
                                          ];
                                    }

                                    Or of course alias the columns in the original query, if you can get at that.

                                    I've used aliases. The bracket syntax is not supported on some of our systems, STILL. 🙁

                                      Weedpacket;11060781 wrote:

                                      Or of course alias the columns in the original query, if you can get at that.

                                      Would you mind showing an example of something like this? It's been very enlightening to see the different and sleeker methods of creating these variables.

                                        Write a Reply...