i figure this might be easier these days with the proper garbage collection, but i'm forced to use an old version of php

i've got a recursive function that loads some data from a file like so: (extremely simplified so you can copy-paste and test for yourself)

function recursive($str, &$container)
{
    $pos1 = 5;
    $pos2 = $pos1 + 5;
    $container .= substr($str,$pos1,$pos2);
    echo memory_get_usage()."<br />";
    $str = substr($str, $pos2);
    echo memory_get_usage()."<br /><br />";
    if (strlen($str) > 0)
    {
        recursive($str, $container);
    }
    else return;
}
$str = "/*insert a very long piece of text here*/";
$a = "";
recursive($str, $a);
echo $a;
example output:
497008
583880

583952
670816

670888
757744

757824
844664

844736
931568

931640
1018464

*and so on*

the memory usage continues to climb by the size of the remaining substr on each iteration, meaning it's storing the original $str, the first substr, the second substr, and so on all in memory - this obviously adds up quite quickly

    Why not change the recursion into iteration?

      Well yeah; none of those variables go out of scope until the function returns. And that doesn't happen until the very end of the function, when they all return.

      5.3 will do the same thing; its garbage collection is for clearing out circular references, and there are no circular references here.

      (Incidentally, the third argument to substr() is a length; the way you've written it makes it look like it's expected to be a position.)

        laserlight;10950332 wrote:

        Why not change the recursion into iteration?

        because the code is actually a lot more involved than that, i simplified it for posting here

        Weedpacket;10950336 wrote:

        (Incidentally, the third argument to substr() is a length; the way you've written it makes it look like it's expected to be a position.)

        yeah, it should be $pos2 = 5; (for example), like you say; a little mistake in the simplifying process 🙂

        Weedpacket;10950336 wrote:

        Well yeah; none of those variables go out of scope until the function returns. And that doesn't happen until the very end of the function, when they all return.

        what variables?
        i'm inserting the substr of $str in to $str itself - shouldn't that use the same memory as before?
        of course the substr function needs more to run, but all of that should be freed afterward?

        when i recall the recursion, i only ever have that one variable ($str), where's all the other (redundant) data held? how can i release it?

          just to clarify, i'm expecting the substr() call to momentarily eat up memory (when it's being executed), but release that memory when it's done, so the output of the echoes would be:
          497008
          497008

          497080
          497080

          and so on

            joppe_ wrote:

            when i recall the recursion, i only ever have that one variable ($str), where's all the other (redundant) data held? how can i release it?

            You cannot do anything about them.

            joppe_ wrote:

            i'm inserting the substr of $str in to $str itself - shouldn't that use the same memory as before?

            No, the $str variable in each recursive call is different. The $container is the same because pass by reference is used, thus the $container parameter in each recursive call is an alias of the same variable.

            joppe_ wrote:

            because the code is actually a lot more involved than that, i simplified it for posting here

            Still, you can eliminate tail recursion rather easily, and if need be, you can use an explicit stack. When you use an explicit stack, you have control over exactly what needs to be maintained; this answers your previous question about releasing "redundant" data.

              joppe wrote:

              what variables?

              All those variables named $str. There's one for each call to the function (in technical parlance, "local" to the function).

                Weedpacket;10950346 wrote:

                All those variables named $str. There's one for each call to the function (in technical parlance, "local" to the function).

                thanks, this gave me an idea, which it seems have solved the problem to some extent (the extent of my sample code anyway, actual code might need some more debugging)

                by retrieving the str as a pointer:
                function ExecuteLoadRecursive(&$str, &$container)
                🙂

                  joppe_ wrote:

                  thanks, this gave me an idea, which it seems have solved the problem to some extent (the extent of my sample code anyway, actual code might need some more debugging)

                  If you can use pass by reference for both parameters, I daresay that is a strong hint that you can rather easily change the recursion to iteration. Such a solution may be easier to maintain in the long run as aliasing tends to make the code harder to prove to be correct.

                    after 10 minutes with the real code, it now works as it should - original problem solved, going off topic now 🙂

                    i don't think it's possible to do what i'm doing with simple iteration
                    what i'm doing is a manual load of a dom document (because i have no access to the server to install the dom module)
                    if you really don't have anything better to do, here's the whole function (plus the calling one) 🙂
                    i have taken certain shortcuts, because it's done for a very specific need and i don't need all the functionality of DOMDocument for what's being done
                    also, i haven't done much php in general, so i'm sure there's plenty of stupid stuff here that could be done better 🙂

                    /*this is inside a custom MiniDOMDocument class, which of course has other functions as well*/
                    	function Load($xml)
                    	{
                    		$str = file_get_contents($xml);
                    		$this->ExecuteLoadRecursive($str, $this);
                    		return true;
                    	}
                    
                    function ExecuteLoadRecursive(&$str, &$container)
                    {
                    	$begin = strpos($str,"<");
                    	$end = strpos($str,">");
                    	if (strlen($str) > 2 && $str{$begin+1} == "/") // closing tag </foo>
                    	{
                    		// continue after closing tag
                    		$str = substr($str, $end+1);
                    		$this->ExecuteLoadRecursive($str, $container->parentNode);
                    	}
                    	else if (!is_null($begin) && $end) // if nodes left
                    	{
                    		if ($str{$end-1} == "/") // empty node <foo/>
                    		{
                    			$name = substr($str, $begin+1, $end-$begin-2);
                    			// append new empty node
                    			$container->childNodes[] =& new MiniDOMNode($name, $container, "");
                    			// continue after empty node
                    			$str = substr($str, $end+1);
                    			$this->ExecuteLoadRecursive($str, $container);
                    		}
                    		else if ($str{$end+1} == "<") // if no value
                    		{
                    			$name = substr($str, $begin+1, $end-$begin-1);
                    			if ($str{$end+2} == "/") // empty node <foo></foo>
                    			{
                    				// append new empty node
                    				$container->childNodes[] =& new MiniDOMNode($name, $container, "");
                    				// continue
                    				$str = substr($str, strpos($str,">",$end+2)+1);
                    				$this->ExecuteLoadRecursive($str, $container);
                    			}
                    			else // inner node <foo><bar>...
                    			{
                    				// append new node
                    				$newNode = new MiniDOMNode($name, $container, "");
                    				$container->childNodes[] =& $newNode;
                    				// continue in new node
                    				$str = substr($str, $end+1);
                    				$this->ExecuteLoadRecursive($str, $newNode);
                    			}
                    		}
                    		else // node has value
                    		{
                    			$name = substr($str, $begin+1, $end-$begin-1);
                    			$pos = strpos($str, "<", $end+1);
                    			$value = substr($str, $end+1, $pos-$end-1);
                    			$newNode = new MiniDOMNode($name, $container, $value);
                    			$container->childNodes[] =& $newNode;
                    			$str = substr($str, $pos);
                    			$this->ExecuteLoadRecursive($str, $newNode);
                    		}
                    	}
                    	else // no nodes left
                    	{
                    
                    	}
                    }
                      joppe_ wrote:

                      i don't think it's possible to do what i'm doing with simple iteration

                      It is always possible to convert recursion into iteration, and vice versa, though admittedly sometimes it is not simple, i.e., one may need an explicit stack to convert recursion to iteration.

                      In this case, it looks like you are effectively doing tail recursion in each branch. Not tested, but perhaps the recursion could be converted to:

                      function Load($xml)
                      {
                          $str = file_get_contents($xml);
                          $this->ExecuteLoad($str);
                          return true;
                      }
                      
                      function ExecuteLoad($str)
                      {
                          $container =& $this;
                          for (;;)
                          {
                              $begin = strpos($str, "<");
                              $end = strpos($str, ">");
                              if (strlen($str) > 2 && $str{$begin+1} == "/") // closing tag </foo>
                              {
                                  // continue after closing tag
                                  $str = substr($str, $end+1);
                                  $container &= $container->parentNode;
                              }
                              else if (!is_null($begin) && $end) // if nodes left
                              {
                                  if ($str{$end-1} == "/") // empty node <foo/>
                                  {
                                      $name = substr($str, $begin+1, $end-$begin-2);
                                      // append new empty node
                                      $container->childNodes[] =& new MiniDOMNode($name, $container, "");
                                      // continue after empty node
                                      $str = substr($str, $end+1);
                                  }
                                  else if ($str{$end+1} == "<") // if no value
                                  {
                                      $name = substr($str, $begin+1, $end-$begin-1);
                                      if ($str{$end+2} == "/") // empty node <foo></foo>
                                      {
                                          // append new empty node
                                          $container->childNodes[] =& new MiniDOMNode($name, $container, "");
                                          // continue
                                          $str = substr($str, strpos($str,">",$end+2)+1);
                                      }
                                      else // inner node <foo><bar>...
                                      {
                                          // append new node
                                          $newNode = new MiniDOMNode($name, $container, "");
                                          $container->childNodes[] =& $newNode;
                                          // continue in new node
                                          $str = substr($str, $end+1);
                                          $container =& $newNode;
                                      }
                                  }
                                  else // node has value
                                  {
                                      $name = substr($str, $begin+1, $end-$begin-1);
                                      $pos = strpos($str, "<", $end+1);
                                      $value = substr($str, $end+1, $pos-$end-1);
                                      $newNode = new MiniDOMNode($name, $container, $value);
                                      $container->childNodes[] =& $newNode;
                                      $str = substr($str, $pos);
                                      $container =& $newNode;
                                  }
                              }
                              else // no nodes left
                              {
                                  break;
                              }
                          }
                      }

                      EDIT:
                      By the way, this looks wrong:

                      else if (!is_null($begin) && $end) // if nodes left

                      strpos() returns false, not null, if the search string cannot be found. Therefore, that line should be:

                      else if ($begin !== false && $end !== false) // if nodes left

                        I'd just add that it depends on what you mean by "old version" whether or not the '&' in assignments like

                                    $container &= $container->parentNode; 

                        and the constructors are necessary. They were required in PHP 4 to emulate the behaviour that programmers expected from objects and is now native in PHP 5.

                        (And laserlight; before we launch that debate again, I'll say that I've reconfigured my stance: argument passing is still "by value" for objects; in v4 that was the value of the object, in PHP5 it's the value of the object reference - not the same thing as the reference itself, which is what you get when you use '&'!)

                          Weedpacket wrote:

                          (And laserlight; before we launch that debate again, I'll say that I've reconfigured my stance: argument passing is still "by value" for objects; in v4 that was the value of the object, in PHP5 it's the value of the object reference - not the same thing as the reference itself, which is what you get when you use '&'!)

                          Yay! 😃

                            Weedpacket;10950404 wrote:

                            (And laserlight; before we launch that debate again, I'll say that I've reconfigured my stance: argument passing is still "by value" for objects; in v4 that was the value of the object, in PHP5 it's the value of the object reference - not the same thing as the reference itself, which is what you get when you use '&'!)

                            I guess pass by value, reference (pointers in C,C++ speak) is a major cause of confusion among developers. This is especially so if say you come from a C,C++ background like I am. Pointers run-away was a constant source of bugs that never fail to "amaze" even for very experienced developers.

                            Because of C,C++ pointer problem notoriety, Java and a lot of other languages take on a totally different approach to "remove" it and asking a GC to handle dangling pointers and references. This is good and I believe PHP take the same design as Java. Everything is passed by value. For objects, it will then be the value of the reference.

                            Unfortunately (or you may say surprisingly), Perl decide to follow the C/C++ way. Perl references essentially bring back the C/C++ horrors. Lucky for me, grasping it is not too difficult due to my background learning but somehow I feel Larry Wall should re-design the Perl references to follow the Java, PHP pass by value arguments design isn't it ? :quiet:

                              sohguanh wrote:

                              This is good and I believe PHP take the same design as Java. Everything is passed by value.

                              Weedpacket's post actually implies that PHP has a pass by reference mechanism akin to C++. Read the PHP manual on References Explained.

                              Furthermore, you talk about "C,C++ pointer problem notoriety", and then say that "everything is passed by value" as if that were the solution by itself. If you were not aware, in C, everything is passed by value. The simplication brought by languages such as Java has to do with removing pointer arithmetic. The reference types of C++ likewise do not allow pointer arithmetic.

                              The use of garbage collection is a separate, though related, issue.

                              sohguanh wrote:

                              For objects, it will then be the value of the reference.

                              My point is that the word "reference" is rather confusingly overloaded. This is why I prefer the term that Weedpacket used: value of the object reference.

                                laserlight;10950489 wrote:

                                Furthermore, you talk about "C,C++ pointer problem notoriety", and then say that "everything is passed by value" as if that were the solution by itself. If you were not aware, in C, everything is passed by value. The simplication brought by languages such as Java has to do with removing pointer arithmetic. The reference types of C++ likewise do not allow pointer arithmetic.

                                My point is that the word "reference" is rather confusingly overloaded. This is why I prefer the term that Weedpacket used: value of the object reference.

                                I agree. In some terminology reference is used interchangeably with pointer. I agree since C++ is the son of C, they inherit the pointer syntax and then introduce another called reference. Therefore C++ have value, pointer and reference.

                                You argue it is removing pointer arithmetic but I believe that is not the only case alone. How so often we have confused when to de-reference a pointer variable in C,C++ or attempt to de-reference a value, reference variable and other permutation that causes the notorious "Segmentation fault" in Unix g++ era.

                                So my take is not only remove pointer arithmetic but also to keep the design simple. You may not agree but that is ok. Everyone is entitled to their opinion.

                                PS Btw my very first software program is written in C and it is the classic K&R Hello World example. You still remember your very first program ? 🙂

                                  sohguanh;10950492 wrote:

                                  PS Btw my very first software program is written in C and it is the classic K&R Hello World example. You still remember your very first program ? 🙂

                                  🙂

                                    sohguanh wrote:

                                    You argue it is removing pointer arithmetic but I believe that is not the only case alone. How so often we have confused when to de-reference a pointer variable in C,C++ or attempt to de-reference a value, reference variable and other permutation that causes the notorious "Segmentation fault" in Unix g++ era.

                                    Yes, I can agree that removing pointer syntax in general is part of the simplification. However, the equivalent of segmentation faults still exist with Java and PHP (aside from interpreter bugs 🙂), because variables do not necessarily contain valid object references, e.g., they may be set to null.

                                    Attempting to dereference a variable that is not a pointer will cause a compile error. Failing to dereference a pointer when it is required is a mistake that can lead to a segmentation fault, but object references have a similiar pitfall: one might assign to the reference variable that is the parameter of a function, but as pass by value is used, changes to the reference variable will not be reflected in the caller.

                                    sohguanh wrote:

                                    PS Btw my very first software program is written in C and it is the classic K&R Hello World example. You still remember your very first program ?

                                    No, but I think it was a simple Javascript.

                                      25 days later
                                      laserlight;10950366 wrote:

                                      It is always possible to convert recursion into iteration, and vice versa, though admittedly sometimes it is not simple, i.e., one may need an explicit stack to convert recursion to iteration.

                                      In this case, it looks like you are effectively doing tail recursion in each branch. Not tested, but perhaps the recursion could be converted to:

                                      function Load($xml)
                                      {
                                          $str = file_get_contents($xml);
                                          $this->ExecuteLoad($str);
                                          return true;
                                      }
                                      
                                      function ExecuteLoad($str)
                                      {
                                          $container =& $this;
                                          for (;;)
                                          {
                                              $begin = strpos($str, "<");
                                              $end = strpos($str, ">");
                                              if (strlen($str) > 2 && $str{$begin+1} == "/") // closing tag </foo>
                                              {
                                                  // continue after closing tag
                                                  $str = substr($str, $end+1);
                                                  $container &= $container->parentNode;
                                              }
                                              else if (!is_null($begin) && $end) // if nodes left
                                              {
                                                  if ($str{$end-1} == "/") // empty node <foo/>
                                                  {
                                                      $name = substr($str, $begin+1, $end-$begin-2);
                                                      // append new empty node
                                                      $container->childNodes[] =& new MiniDOMNode($name, $container, "");
                                                      // continue after empty node
                                                      $str = substr($str, $end+1);
                                                  }
                                                  else if ($str{$end+1} == "<") // if no value
                                                  {
                                                      $name = substr($str, $begin+1, $end-$begin-1);
                                                      if ($str{$end+2} == "/") // empty node <foo></foo>
                                                      {
                                                          // append new empty node
                                                          $container->childNodes[] =& new MiniDOMNode($name, $container, "");
                                                          // continue
                                                          $str = substr($str, strpos($str,">",$end+2)+1);
                                                      }
                                                      else // inner node <foo><bar>...
                                                      {
                                                          // append new node
                                                          $newNode = new MiniDOMNode($name, $container, "");
                                                          $container->childNodes[] =& $newNode;
                                                          // continue in new node
                                                          $str = substr($str, $end+1);
                                                          $container =& $newNode;
                                                      }
                                                  }
                                                  else // node has value
                                                  {
                                                      $name = substr($str, $begin+1, $end-$begin-1);
                                                      $pos = strpos($str, "<", $end+1);
                                                      $value = substr($str, $end+1, $pos-$end-1);
                                                      $newNode = new MiniDOMNode($name, $container, $value);
                                                      $container->childNodes[] =& $newNode;
                                                      $str = substr($str, $pos);
                                                      $container =& $newNode;
                                                  }
                                              }
                                              else // no nodes left
                                              {
                                                  break;
                                              }
                                          }
                                      }

                                      because of the way pointers/references/whatever work in old php, setting the $container variable as &$newNode causes a problem
                                      on the next iteration $newNode is given a new value, which also changes the $container to point to that new value via $newNode

                                      so doing this:

                                      $newNode = new MiniDOMNode($name, $container, "");
                                      $container->childNodes[] =& $newNode;

                                      will create a new node and set it as its own child, completely isolated from the previous nodes

                                        joppe wrote:

                                        because of the way pointers/references/whatever work in old php

                                        References in PHP4 work the same as in PHP5.

                                        joppe wrote:

                                        setting the $container variable as &$newNode causes a problem
                                        on the next iteration $newNode is given a new value, which also changes the $container to point to that new value via $newNode

                                        Yes, that sounds like a correct observation.

                                        joppe wrote:

                                        will create a new node and set it as its own child, completely isolated from the previous nodes

                                        Again, not tested, but I believe the solution here is to rebind $newNode as an alias of the newly created MiniDOMNode object, i.e.,

                                        $newNode =& new MiniDOMNode($name, $container, "");
                                        $container->childNodes[] =& $newNode;