Righty-ho. I just looked at that and reckon that one idea that might help with the parser is to use preg_match() to locate the next "interesting bit".
Say you're currently inside generic php code. (for($i=0; $i<42; $i++) sort of stuff. The interesting bits when you're in this state are: " that starts a double-quoted string; ' that starts a single-quoted string, <<< that starts a heredoc/nowdoc-quoted string; // that starts a rest-of-line comment, /* that starts a block comment, and ?> that ends a PHP block. You need to know which of these appears first, so that you know which state to go into next. Assuming that your source code is in $source, and that you're currently scanning the $offset'th character:
preg_match('!("|\'|<<<|//|/\*|\?>)!', $source, $match, PREG_OFFSET_CAPTURE, $offset)
(I think I've got that right) will, assuming it finds anything, will find the first interesting bit in $code that appears after $offset; depending on what it is the parser would then go on with scanning a string, a comment, or non-PHP stuff.
When scanning a heredoc-quoted string, you'd need to capture the delimiter that follows the "<<<", so you know when the string ends.
When scanning a quoted string, the next quote you're interested in must be preceded by an even number of backslashes (possibly zero); if there are an odd number, then the quote character itself is part of the string. That can be done with the pattern b'[/b] for single-quoted strings (it says "an apostrophe, but only if it's preceded by zero or more repetitions of '\'"; four backslashes because PCRE uses the backslash as its escape character, so they both need to be escaped).