What is wrong with this regex '/<\/?table[>]?>/'
I am using it with this code

$files=`find /home/bubblenut/httpd/test1`;
$files=explode("\n",$files);
$counf_f=count($files);
$regex4='/<\/?table[^>]*?>/';
for($i=0;$i<$count_f;$i++) {
  if(!check($files[$i])) {
    continue;
  } else {
    $string=get_file($files[$i]);
    $split=strip_code($string);
    for($j=0;$j<count($split['html']);$j++) {
      $split['html'][$j]=preg_replace($regex4,'',$split['html'][$j]);
    }
  }
}
?>

check checks that the item is a file rather than a directory.
get_file just returns the contents of the file
strip_code divides the contents into php code, html, and possible php variables (I know this bit works as I have tested it thoroughly)

The program is not supposed to output anything I just want to get it running and then I'll start analizing the output. If I comment out the preg_replace then it all goes through fine if I comment out the preg_replace.
Thanks
Bubble

    If the file is really large, you could run out of memory.

    Doing a preg_replace or other string operation temporarily gives you two strings, the original and the new one. If the string is really large, having two might be enough to exceed memory limits.

    The simplest solution is to increase the memory limit. This will take care of the PHP aspect:
    ini_set('memory_limit', '64M');
    Your OS may have per-process memory limits imposed too, which won't be affected by that setting. But most likely it is the PHP limit that you're hitting.

    The best way to handle the situation is to process the files while you read them, instead of reading the whole file into memory all at once. That can be more difficult to code, though.

      15 days later

      Do you really mean to write '/</?table[>]*?>/' ?

      Putting *? together doesn't make much sense. If you meant to match '?>' try escaping the ? first... '\?>', as the question mark makes matching the end '>' optional. Also, should that second '/' be in there? It looks like you're using that as your delimiter, yet it's the 3rd character. You can escape that, too, if needed.

      let me know if you need anything else.

        Originally posted by dave420
        Do you really mean to write '/</?table[>]*?>/' ?

        Putting ? together doesn't make much sense.

        Makes perfect sense if you want ungreedy matching. Of course, in this case it's redundant, since [>] is supposed to stop as soon as it hits a > anyway, but it's not a fatal error.

        My suspicion does lean to the memory limit; maybe a pipelined solution (starting with [man]popen[/man] and reading line by line) would work better.

          Yes, the [>]* will match everything up to the closing bracket, but as it's followed by a ?, it becomes optional. That means it can feasibly match every character up to the last bracket in the page, including all other close brackets along the way.

          I've found that it's very, very hard to run out of memory matching patterns in PHP, unless the pattern is flawed. If you think about it, the very nature of a pattern should make sure that never happens...

            Originally posted by dave420
            Yes, the [>] will match everything up to the closing bracket, but as it's followed by a ?, it becomes optional.

            Nonono, it's the ungreedy modifier. Without it, (subpattern) will try to match as long a string as possible while still allowing the entire regexp to match. With ?, it works exactly the same except that "long" is replaced by "short".

              Write a Reply...