MySQL charset str_replace problem

PHP Help General Help
MySQL charset str_replace problem

Ssafra
Sep 25, 2012
Post #1 Tuesday, September 25, 2012 1:01 PM

Hello,

I have a mysql database with tables / fields charset set to utf8_general_ci. The data of a specific column can hold characters from all over the world, it is set to utf8_general_ci.

Initially I had issues with characters being replaced with ? That was solved by using mysql_set_charset('utf8',$conn);

All characters are displayed correctly on the webpage now.

Now I want to replace all special characters for their "standard" version, by using str_replace() with arrays for the characters to search for and the replace equivalents. This does not work. Question marks in a black box appear.

I found that for, for example for spanish (latin) characters, mysql_set_charset('latin',$conn); solves this issue.

I cannot figure out what is going on, the database is set to utf8, mysql_set_charset() is set to utf8, contents is displayed correctly. Yet to do a str_replace, I have to apparantly change the charset to latin in this case?

As the data can be in any language of the world, how to deal with this? Is there a simple way to make a generic str_replace work. Do I indeed have to set mysql_set_charset() to the respective charset? If so, is there something like a reference list for mysql charsets and countries/languages (all I can find is ISO code languages per country)?

Hope someone can point me in the right direction!

Thanks,

Weedpacket
Sep 25, 2012
Post #2 Tuesday, September 25, 2012 3:14 PM

1) What do you mean, their "standard" version?

2) How are you writing the str_replace call, and is the PHP file it appears in itself UTF-8 encodd?

Ssafra
Sep 26, 2012
Post #3 Wednesday, September 26, 2012 9:05 AM

All this is in a seperate PHP script called through Ajax code. But I just tested it in a single page with:

and the same thing happens.

With standard version and str_replace I mean the following:

// short versions
$source= array( 'À','à','Á','á','Â','â');
$target= array( 'A','a','A','a','A','a');
str_replace($source, $target, $str);

Meanwhile I read something about multibyte characters and str_replace issues. I found this thread:
http://stackoverflow.com/questions/1451144/php-multi-byte-str-replace

The normalize option didn't work because apparantly that class is not available on my php install. So I tried the mb_str_replace option which resulted in:

Compilation failed: invalid UTF-8 string at offset 4

which led me to:

http://www.simplemachines.org/community/index.php?topic=474532.0 first reply

This is getting confusing! Am I on the right track?

Ssafra
Sep 28, 2012
Post #4 Friday, September 28, 2012 8:36 AM

for those without access to mb_str_replace and normalize()

http://stackoverflow.com/questions/3635511/remove-diacritics-from-a-string > wordpress solution

4 days later

AAshley_Sheridan
Oct 2, 2012
Post #5 Tuesday, October 2, 2012 11:05 AM

I have to ask, why would you want to replace accented characters with non-accented ones? Just because they look similar doesn't mean they are interchangeable.

Bbradgrafelman
Oct 2, 2012
Post #6 Tuesday, October 2, 2012 3:37 PM

Ashley Sheridan;11014231 wrote:
Just because they look similar doesn't mean they are interchangeable.

In fact, they likely are not. For example, 'Regresa a mí' in Spanish would translate to 'Return to me,' whereas 'Regresa a mi' translates to the incomplete sentence 'Return to my'. Granted, a native Spanish speaker might automatically assume you were either lazy to hit the accent mark on your keyboard or you're just some ignorant gringo (), but still... what's wrong with leaving the text as-is rather than mangling it into something it shouldn't be?

Weedpacket
Oct 3, 2012
Post #7 Wednesday, October 3, 2012 1:58 AM

safra wrote:
As the data can be in any language of the world...

$source = array('&#974;', '&#65201;', '&#1096;');
$target = array(???);

Write a Reply...