I'm testing some options for a socket communication protocol. The thread is here. I'm trying to make sure no null chars [i.e., the char returned by chr(0)] exist in a string by replacing them with a \0 sequence per a suggestion by Weedpacket. I told good old master weedpacket that given that I need to 'unescape' them later, I run the risk of accidentally turning legitimate occurrences of \0 into a null char to which he suggested I first escape the backslashes.
So I got my encode function which looks something like this:
function other_encode($str) {
$result = str_replace("\\", "\\\\", $str);
$result = str_replace(chr(0), "\\0", $result);
return $result;
}
The problem I'm having is decoding...I have to first turn those occurrences of \0 which correspond to null chars back into null chars and then I have to unescape those escaped backslashes. NOW the problem becomes how to handle the case where we have null chars preceded by some number of backslashes. Here's my decode function:
function other_decode($str) {
$pattern = "/(?!\\\\\\\\)\\\\0/";
$result = preg_replace($pattern, chr(0), $str);
$result = str_replace("\\\\", "\\", $result);
return $result;
}
As you might observe, I'm trying to assert that the \0 null char is NOT preceded by two backslashes. The reason there are so damn many backslashes is due to weird regex syntax stuff. What this really needs to do is make sure that the number of backslashes preceding a given zero is ODD and not EVEN. Here is some code that illustrates the problem a bit:
$str = "here is a string with a \\ backslash and a \0 null char";
echo "original:" . $str . "\n";
$foo = other_encode($str);
echo "encoded: " . $foo . "\n";
echo "decoded: " . other_decode($foo) . "\n";
echo "\n";
$str2 = "here is a strong with two \\\\ backslashes and also a \\0 backslash followed by zero";
echo "original:" . $str2 . "\n";
$foo = other_encode($str2);
echo "encoded :" . $foo . "\n";
echo "decoded :" . other_decode($foo) . "\n";
The output:
original:here is a string with a \ backslash and a null char
encoded: here is a string with a \\ backslash and a \0 null char
decoded: here is a string with a \ backslash and a null char
original:here is a strong with two \\ backslashes and also a \0 backslash followed by zero
encoded :here is a strong with two \\\\ backslashes and also a \\0 backslash followed by zero
decoded :here is a strong with two \\ backslashes and also a \ backslash followed by zero
Any help from you regex masters would be most appreciated.