Hey all,
could use some help with a simple search script I've made:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Search database</title>
<link href="scripts/datatable.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="list-container">
  <div class="categories">
    <form action="<?php $_SERVER['PHP_SELF']; ?>" method="post">
      Search:
      <input type="text" name="term" />
      <input type="submit" name="submit" value="Submit" />
    </form>
  </div>
  <div class="letters">
    <hr />
  </div>
  <div class="contents-container">
    <?php
	// database connection (move?)
mysql_connect ("localhost", "root","")  or die (mysql_error());
mysql_select_db ("micardsample");
// mandatory for Greek language support
mysql_query("SET NAMES 'utf8'");
mysql_query("SET COLLATION_CONNECTION=utf8_general_ci");

  if(isset($_POST['submit'])){ 
	if(preg_match("/[\p{Greek}A-Za-z0-9]+/u", $_POST['term'])){

$term = strip_tags(trim($_POST['term']));

// query to select from category or company
$sql = mysql_query("SELECT * FROM companies WHERE category OR company LIKE '%$term%'");
// if results are zero and string length is lower than 2 go to else "No results"
  if ( (mysql_num_rows($sql) > 0) and (strlen($term) >= 2) ){
	while ($row = mysql_fetch_array($sql)){
?>
<div class="contents">
  <div class="image-and-info">
    <p class="info"><img src="<?php echo $row['imageurl']; ?>" align="left"/></p>
    <p class="info"><?php echo $row['company']; ?></p>
    <p class="info"><?php echo $row['category']; ?></p>
  </div>
  <p class="info"><a href="<?php echo $row['homeurl']; ?>" target="_new">Home site</a></p>
</div>
<?php
	}
  } else { echo "<p>&#916;&#949;&#957; &#946;&#961;&#941;&#952;&#951;&#954;&#945;&#957; &#945;&#960;&#959;&#964;&#949;&#955;&#941;&#963;&#956;&#945;&#964;&#945;.</p>"; 
	} 
} else { echo  "<p>&#917;&#953;&#963;&#940;&#947;&#949;&#964;&#949; &#972;&#961;&#959; &#945;&#957;&#945;&#950;&#942;&#964;&#951;&#963;&#951;&#962;.</p>";
  }
  }

?>
  </div>
</div>
</body>
</html>

So, it works fine if a user inputs 2 or more characters but when using Greek characters it seems like strlen recognizes even 1 characters as larger than 2. Why's that happening?

    I believe it's because PHP is natively 8-bit ASCII, so mutlibyte characters will be counted as one character per byte by strlen(). Perhaps you could use [man]mb_strlen/man, instead?

      The issue of multibyte charsets can get pretty complicated because it depends not just on what you declare for your page's charset but also may depend on how you save your PHP file. There's a huge thread wherein I get schooled by Weedpacket about the multibyte char issue.

      The basic idea is that if you have non-ASCII characters in your PHP code, you should probably be saving the PHP file itself as UTF-8. And by ASCII, I'm referring only to the first 128 chars of the ASCII character set. Any character not included in these first 128 characters will be encoded with more than one byte in UTF-8.

      The PHP language itself is expressed using only characters that are a subset of ASCII. The [man]strlen[/man] function assumes that you are NOT using a multibyte character set. strlen is going to literally return the number of bytes in a string variable. If that string variable contains UTF-8 text full of Japanese or Greek or Chinese characters, strlen will not tell you how many characters it contains, it will tell you how many bytes it contains.

      I see that your page declares the UTF-8 charset for the DB and for the page. I agree with NogDog that you should be using the multibyte string functions.

        Thanks alot for the help guys!

        mb_strlen($term, 'utf-8')

        made it work perfectly. Got a lot of things to learn... 🙂

        Btw, do you think strip_tags() and trim() is enough for sanitizing the input? I can't use htmlentities() since it messes up the Greek characters...

          Yes you can

          htmlentities($string, ENT_QUOTES, 'utf-8');
          

            Tried it but it won't allow greek chars anyway. Wouldn't this be the correct syntax:

            $term = mysql_real_escape_string(strip_tags(trim($_POST['term'])));
            	$term = htmlentities($term, ENT_QUOTES, 'utf-8');

              Applying [man]mysql_real_escape_string/man to sanitize the data for a DB query should be the last thing you do to the data before actually putting it in the SQL query string.

                bradgrafelman;10968984 wrote:

                Applying [man]mysql_real_escape_string/man to sanitize the data for a DB query should be the last thing you do to the data before actually putting it in the SQL query string.

                Alright. Corrected that.

                Thanks again.

                  htmlentities does work with utf-8 if that's supplied as third argument, and utf-8 does include greek characters. Your problem lies elsewhere. Also, I'd run the string through mysql_real_escape_string at the very last, to avoid the risk that some other function would introduce a character used as delimiter in your database.

                  # Test string using greek letters
                  $s = "&#916;&#949;&#957; &#946;&#961;&#941;&#952;&#951;&#954;&#945;&#957; &#945;&#960;&#959;&#964;&#949;&#955;&#941;&#963;&#956;&#945;&#964;&#945;.";
                  echo 'Initial string: ' . $s . "<br/>\n";
                  
                  $s = strip_tags(trim($s));
                  $s = htmlentities($s, ENT_QUOTES, 'utf-8');
                  echo 'Html entities: ' . $s."<br/>\n";
                  
                  # Use mysql_real_escape string just before inserting to  or updating database, to avoid
                  # introducing characters that screw up your sql statement. And only do so if you actually insert
                  # it do the database
                  $db_escaped = mysql_real_escape_string($s);
                  echo 'Escaped for DB: ' . $db_escaped; # same as above line for my particular DB.
                  

                  HTML code Output

                  Initial string: &#916;&#949;&#957; &#946;&#961;&#941;&#952;&#951;&#954;&#945;&#957; &#945;&#960;&#959;&#964;&#949;&#955;&#941;&#963;&#956;&#945;&#964;&#945;.<br/>
                  Html entities: &Delta;&epsilon;&nu; &beta;&rho;&#941;&theta;&eta;&kappa;&alpha;&nu; &alpha;&pi;&omicron;&tau;&epsilon;&lambda;&#941;&sigma;&mu;&alpha;&tau;&alpha;.<br/>
                  Escaped for DB: &Delta;&epsilon;&nu; &beta;&rho;&#941;&theta;&eta;&kappa;&alpha;&nu; &alpha;&pi;&omicron;&tau;&epsilon;&lambda;&#941;&sigma;&mu;&alpha;&tau;&alpha;.
                  

                  Rendered output

                  Initial string: &#916;&#949;&#957; &#946;&#961;&#941;&#952;&#951;&#954;&#945;&#957; &#945;&#960;&#959;&#964;&#949;&#955;&#941;&#963;&#956;&#945;&#964;&#945;.
                  Html entities: &#916;&#949;&#957; &#946;&#961;&#941;&#952;&#951;&#954;&#945;&#957; &#945;&#960;&#959;&#964;&#949;&#955;&#941;&#963;&#956;&#945;&#964;&#945;.
                  Escaped for DB: &#916;&#949;&#957; &#946;&#961;&#941;&#952;&#951;&#954;&#945;&#957; &#945;&#960;&#959;&#964;&#949;&#955;&#941;&#963;&#956;&#945;&#964;&#945;.
                  
                    Write a Reply...