Hoping that this might be useful to others, I offer my solution. I welcome any feedback and all suggestions for improvement.
The scenario is that you have a form that allows users to enter data to search for in an underlying database. You want to offer more than just exact matching. You want them to be able to enter "and" "not" "or" and even use parentheses. You then need to take their input and translate it into a where clause that you can then submit to your database engine.
My requirements:
Allow users to enter phrases enclosed in double quotes
Search terms separated by spaces will be ANDed together
Search terms separated by commas will be ORed together
I have made all searches wildcard searches. (i.e., I look for the search term anywhere in the field).
Allow grouping using parentheses
Allow key words "AND", "NOT", and "OR"
In addition to returning a database ready where clause, also (by including a boolean parameter) return an "English friendly" representation of the query so users can see what they searched for
The function accepts three parameters:
Field Name (either the database field name or the user friendly field name)
The sanitized user-entered search string
An optional boolean. Set to TRUE if you want User Friendly English.
Here is the function included in a sample php form that demonstrates its function:
<?PHP
function text_query($search_field, $search_text, $is_easy_text = false)
{
// First I replace allowed characters into "words": ( ) , " ''
$search_text = str_replace('"',' @DBQT ',strtoupper($search_text));
$search_text = str_replace("''",' @DBQT ',$search_text);
$search_text = str_replace(",",' OR ',$search_text);
$search_text = str_replace("(",' @LPAR ',$search_text);
$search_text = str_replace(")",' @RPAR ',$search_text);
$search_text = str_replace("'","''",$search_text);
// I next check to see if I have an even number of double quotes
$number_of_doublequotes = substr_count($search_text,' @DBQT ');
if (($number_of_doublequotes/2) <> floor($number_of_doublequotes/2))
{
$search_text = str_replace(' @DBQT','',$search_text); // Since we don't have even matched pairs, we'll get rid of them all
}
// Check to see if I have same number of opening and closing parens
$number_of_lpar = substr_count($search_text,' @LPAR ');
$number_of_rpar = substr_count($search_text,' @RPAR ');
if ($number_of_lpar <> $number_of_rpar) {
$search_text = str_replace(' @LPAR','',$search_text); // Since we don't have equal number of opening and closing parens,
$search_text = str_replace(' @RPAR','',$search_text); // we'll get rid of them all.
}
$search_text = trim($search_text);
$text_array = array();
$query_array = array();
$i = 0;
$j = 0;
// Explode words into an array
$text_array = explode(' ', $search_text,50);
$size_of_array = count($text_array);
// Now go through the array looking for double quote words. Terms between the doublequotes are conflated into a single term in the new array
if ($size_of_array > 0) {
while ($i < $size_of_array) {
if ($text_array[$i] <> '')
{
if ($text_array[$i] == "@DBQT")
{
$i++;
$k = 0;
while ($text_array[$i] <> "@DBQT")
{
if ($k == 0)
{
$query_array[$j] = $text_array[$i];
$k++;
}
else
{
$query_array[$j] = $query_array[$j]." ".$text_array[$i];
}
$i++;
}
$i++;
$j++;
}
else
{
$query_array[$j] = trim($text_array[$i]);
$i++;
$j++;
}
}else
{
$i++;
}
}
$return_str = '';
// We now have an array where each value is either a key word or a search term
// Now go through the array and build either the T-SQL query or a plain English query
foreach ($query_array as $key => $value)
{
switch (trim($value)) {
case '@LPAR':
if ($key == 0)
{
$return_str = "(";
}
elseif (($query_array[$key - 1] == 'OR') OR ($query_array[$key - 1] == 'AND') OR ($query_array[$key - 1] == 'NOT') )
{
$return_str .= " (";
}
else
{
$return_str .= " AND (";
}
break;
case '@RPAR':
$return_str .= ")";
break;
case 'AND':
case 'OR':
$return_str .= " $value";
break;
case 'NOT':
if (($query_array[$key - 1] == 'AND') ||($query_array[$key - 1] == 'OR') )
{
$return_str .= " NOT";
break;
}
else
{
$return_str .= " AND NOT";
break;
}
default:
if (($key == 0) || ($query_array[$key - 1] == '@LPAR')) // Skip leading space if first term or following Left parens
{
if ($is_easy_text === true)
{
$return_str .= "<font color=red>$value</font>";
}
else
{
$return_str .= "($search_field LIKE '%$value%')";
}
}
elseif (($query_array[$key - 1] == 'OR') OR ($query_array[$key - 1] == 'AND') OR ($query_array[$key - 1] == 'NOT') OR ($query_array[$key - 1] == '@LPAR') OR ($query_array[$key - 1] == '@RPAR'))
{
if ($is_easy_text === true)
{
$return_str .= " <font color=red>$value</font>";
}
else
{
$return_str .= " ($search_field LIKE '%$value%')";
}
}
else // Add "AND" if there is no key word proceeding. This allows user to list search terms that should be ANDed together.
{
if ($is_easy_text === true)
{
$return_str .= " AND <font color=red>$value</font>";
}
else
{
$return_str .= " AND (".$search_field." LIKE '%".$value."%')";
}
}
}
}
if ($is_easy_text === true)
{
$return_str = "($search_field contains ".trim(str_replace("''","'",$return_str)).")";
}
else
{
$return_str = "(".trim($return_str).")";
}
return $return_str;
}
else
{
return '';
}
}
$searchfield = 'dbo.pet.type';
$friendlyfield = 'Pet Type';
$searchstring = 'cat domestic not (siamese, angorra, tabby)';
echo '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html xmlns:str="http://exslt.org/strings">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>Text Query Parser Demo</title>
</head>
<body>
<font face="Arial, Helvetica, sans-serif">
<form id="form1" name="form1" method="get" action="'.$_SERVER['PHP_SELF'].'">
<table border=0>
<tr>
<td><div align="right">Field to Search:</div></td>
<td><input type="text" size="50" name="searchfield" id="tf" value="'.$searchfield.'" /></td>
</tr>
<tr>
<td><div align="right">User Friendly Field Name:</div></td>
<td><input type="text" size="50" name="friendlyfield" id="tf2" value="'.$friendlyfield.'" /></td>
</tr>
<tr>
<td><div align="right">Search String:</div></td>
<td><input type="text" size="50" name="searchstring" id="tf3" value="'.$searchstring.'" /></td>
</tr>
<tr>
<td> </td><td><input name="" type="submit" /></td></tr>
</table>
</form>
';
if (@$_REQUEST['submit'] = 'Submit')
{
if (!(isset($_REQUEST['searchstring'])))
{
$searchstring = '';
}
else
{
$searchstring = filter_var($_REQUEST['searchstring'], FILTER_SANITIZE_STRING);
}
if (!(isset($_REQUEST['searchfield'])))
{
$searchfield = '';
}
else
{
$searchfield = filter_var($_REQUEST['searchfield'], FILTER_SANITIZE_STRING);
}
if (!(isset($_REQUEST['friendlyfield'])))
{
$friendlyfield = '';
}
else
{
$friendlyfield = filter_var($_REQUEST['friendlyfield'], FILTER_SANITIZE_STRING);
}
if (($searchfield <> '') && ($searchstring <> '') && ($friendlyfield <> '')) {
$qry = text_query($searchfield,$searchstring);
$easysearchtext = text_query($friendlyfield,$searchstring,true);
}
else {
$qry = "";
$easysearchtext = "";
}
echo '
<table border="1"><tr>
<td align="right"<p><strong>Your Search</strong></td><td>'.$qry.'</p></td></tr>';
echo '<tr><td align="right"><strong>Your Easy Search Text</strong></td><td>'.$easysearchtext.'</p></td></tr></table>';
}
echo '
</font>
</body>
</html>';
?>