Hi All,

currently, I have mbstring installed in php and set up for Japanese. I have set the following,

mbstring.detect_order auto

mbstring.encoding_translation On

mbstring.func_overload 0

mbstring.http_input Auto

mbstring.http_output SJIS

mbstring.internal_encoding SJIS

mbstring.language Japanese

mbstring.substitute_character no value

and the following for httpd.conf,

#
AddCharset ISO-8859-1 .iso8859-1 .latin1
AddCharset ISO-8859-2 .iso8859-2 .latin2 .cen
AddCharset ISO-8859-3 .iso8859-3 .latin3
AddCharset ISO-8859-4 .iso8859-4 .latin4
AddCharset ISO-8859-5 .iso8859-5 .latin5 .cyr .iso-ru
AddCharset ISO-8859-6 .iso8859-6 .latin6 .arb
AddCharset ISO-8859-7 .iso8859-7 .latin7 .grk
AddCharset ISO-8859-8 .iso8859-8 .latin8 .heb
AddCharset ISO-8859-9 .iso8859-9 .latin9 .trk
AddCharset ISO-2022-JP .iso2022-jp .jis
AddCharset ISO-2022-KR .iso2022-kr .kis
AddCharset ISO-2022-CN .iso2022-cn .cis
AddCharset Big5 .Big5 .big5

For russian, more than one charset is used (depends on client, mostly):

AddCharset WINDOWS-1251 .cp-1251 .win-1251
AddCharset CP866 .cp866
AddCharset KOI8-r .koi8-r .koi8-ru
AddCharset KOI8-ru .koi8-uk .ua
AddCharset ISO-10646-UCS-2 .ucs2
AddCharset ISO-10646-UCS-4 .ucs4
AddCharset UTF-8 .utf8

Now, when trying to render a page that has the charset set to Shift_JIS,

<meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS" />

it only displays correctly if I change the encoding in the browser to UTF-8. 1st question, is, when there is no default charset set, what is the default charset..? 2nd question, is, can a default charset be set, and if so, how..? 3rd question, is, if there is no default set, shouldn't the coding in the page direct the server to encode in the appropriate charset.? I'm a newbie so please,. if I'm off course, let me know.

    Firstly, your Apache charset settings have no effect on the Charset headers produced by PHP (I think).

    Secondly, I STRONGLY recommend that if you're not using an 8-bit character set, you use UTF-8 for everything.

    UTF-8 is the way to go. If you don't go there, you will be sorry.

    Unfortunately PHP has poor support for encodings, as it doesn't really know what a "string" is. As far as PHP is concerned, a string is an array of bytes.

    This means that functions like strlen() will get the answer wrong, and that string-manipulation functions will muck up UTF8 strings.

    The solution is to use only the mbstring versions of these functions - and if you need to use regular expressions etc, use preg with the utf-8 flag.

    Bear in mind also, that you'll want to write your PHP files in utf8 too - make sure the mbstring internal encoding is set to utf8.

    PHP really sucks for multibyte character set applications. It was designed by some Europeans who didn't understand what character encodings were (and couldn't see past ISO-8859-* anyway)

    Mark

      Hi All,

      ok, I changed all settings to allow UTF-8, including the encoding for the database table's column that is using Japanese. Now, if I view the data stored in that column in phpmyadmin, via say, firefox, it displays in UTF-8, but, if I pull the code from the database and display it in a UTF-8 set page, it is just ? marks, although static J text displays fine...any thoughts on this..?

        4 months later

        Did you manage to find a solution?

        Thanks

        Colin

          Write a Reply...