Hello,

I have a mysql database with tables / fields charset set to utf8_general_ci. The data of a specific column can hold characters from all over the world, it is set to utf8_general_ci.

Initially I had issues with characters being replaced with ? That was solved by using mysql_set_charset('utf8',$conn);

All characters are displayed correctly on the webpage now.

Now I want to replace all special characters for their "standard" version, by using str_replace() with arrays for the characters to search for and the replace equivalents. This does not work. Question marks in a black box appear.

I found that for, for example for spanish (latin) characters, mysql_set_charset('latin',$conn); solves this issue.

I cannot figure out what is going on, the database is set to utf8, mysql_set_charset() is set to utf8, contents is displayed correctly. Yet to do a str_replace, I have to apparantly change the charset to latin in this case?

As the data can be in any language of the world, how to deal with this? Is there a simple way to make a generic str_replace work. Do I indeed have to set mysql_set_charset() to the respective charset? If so, is there something like a reference list for mysql charsets and countries/languages (all I can find is ISO code languages per country)?

Hope someone can point me in the right direction!

Thanks,

    1) What do you mean, their "standard" version?

    2) How are you writing the str_replace call, and is the PHP file it appears in itself UTF-8 encodd?

      All this is in a seperate PHP script called through Ajax code. But I just tested it in a single page with:

      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

      and the same thing happens.

      With standard version and str_replace I mean the following:

      // short versions
      $source= array( 'À','à','Á','á','Â','â');
      $target= array( 'A','a','A','a','A','a');
      str_replace($source, $target, $str);
      

      Meanwhile I read something about multibyte characters and str_replace issues. I found this thread:
      http://stackoverflow.com/questions/1451144/php-multi-byte-str-replace

      The normalize option didn't work because apparantly that class is not available on my php install. So I tried the mb_str_replace option which resulted in:

      Compilation failed: invalid UTF-8 string at offset 4

      which led me to:

      http://www.simplemachines.org/community/index.php?topic=474532.0 first reply

      This is getting confusing! Am I on the right track?

        4 days later

        I have to ask, why would you want to replace accented characters with non-accented ones? Just because they look similar doesn't mean they are interchangeable.

          Ashley Sheridan;11014231 wrote:

          Just because they look similar doesn't mean they are interchangeable.

          In fact, they likely are not. For example, 'Regresa a mí' in Spanish would translate to 'Return to me,' whereas 'Regresa a mi' translates to the incomplete sentence 'Return to my'. Granted, a native Spanish speaker might automatically assume you were either lazy to hit the accent mark on your keyboard or you're just some ignorant gringo (🙂), but still... what's wrong with leaving the text as-is rather than mangling it into something it shouldn't be?

            safra wrote:

            As the data can be in any language of the world...

            $source = array('&#974;', '&#65201;', '&#1096;');
            $target = array(???);
            
              Write a Reply...