I'm trying to use regular expressions in a tokenizer/parser I am writing. I basically need to test if a string matches a regular expression, starting at a certain index. Since this is a parser it must be efficient.
It would be tempting to try: preg_match($buffer, $pattern, $matches, PREG_OFFSET_CAPTURE, $index) with a pattern such as "/[\w]+/". However, the document explains that this will not work (and I confirmed this).
The obvious answer would be to take the substr, but I'm worried about the efficiency of this operation as it will happen for almost every character in the string to match against. Without knowing for sure, I'll bet that each substr operation copies the (necessary) contents of the parent array--too much overhead for my application.
In C/C++, one might try passing the address of the start of the substring, i.e. (in C syntax) preg_match(&buffer[index], pattern, matches). I'm new to PHP, but I'm pretty sure this is not possible. Can anyone confirm this?
I also thought of using array_pop to permanently discard elements of the string (in which case index is no longer necessary). This doesn't appear to be legal in PHP either. Is there a way of accomplishing this (while keeping the string--otherwise I can't do regular expression matching).
Are there any other ways of doing this short of writing my own regular expression compiler?
Thanks in advance,
Robbie