- Edited
I'm familiar with how escape characters in one's code don't actually make their way into the string defined. Escaping them is necessary because, for instance, you need to distinguish quotes that delimit your string from the quotes you might want IN your string. Totally understand that.
What concerns me is that I want my keywords to be part of a REGEXP expression inside and sql expression and escaping is different for SQL than it is for REGEXP. PHP has a preg_quote function, for example.
Some example questions:
- what if one of my keywords is wh*ee -- do I need to escape the asterisk? Is there a function or guidelines for this?
- what if one of my keywords is [[:alnum:]]+ -- should I escape the square brackets?
- what if the db escape (like PDO::quote) function returns quotes in its output? Does that mean I need to escape my entire regexp expression even though this expression is intended to be interpreted as SQL regexp?
For that last example, consider this code:
$db = new PDO("mysql:host=localhost;dbname=my_db", "user", "pass");
$keyword = "foo";
$sql = "SELECT * FROM my_table WHERE my_col REGEXP '[[:<:]]" . $db->quote($keyword) . "[[:>:]]'";
echo $sql . "\n";
The output is broken SQL because the PDO::quote function adds single quotes:
SELECT * FROM my_table WHERE my_col REGEXP '[[:<:]]'foo'[[:>:]]'
I can fix that particular query by escaping the keyword surrounded by the regexp along with the keyword:
$keyword = "[[:<:]]foo[[:>:]]";
$sql = "SELECT * FROM my_table WHERE my_col REGEXP " . $db->quote($keyword);
But is this really what I want to do for a general solution? Are there any chars that might be in my regex which I DON'T want quoted. Like what if my keyword itself contained a single or double quote? This code:
$keyword = '[[:<:]]f"oo[[:>:]]';
$sql = "SELECT * FROM my_table WHERE my_col REGEXP " . $db->quote($keyword);
results in the double quote also being escaped:
SELECT * FROM my_table WHERE my_col REGEXP '[[:<:]]f\"oo[[:>:]]'
NOTE this does actually work -- if you run that query it'll locate records containing the string f"oo but I think it illustrates my concern about escaping. I'd like to avoid crosstalk between escaping REGEXP chars and escaping SQL search keywords.
I'd also point out that the mysql docs on regex don't talk much about escaping anything. Nor does the ICU Reference that it refers to.