Any help with what I suspect will prove to be a straighforward problem, would be much appreciated.
I am trying to display Japanese characters from a MySQL db. For experimentation I have boiled this down to a database with a single table containing two columns. One contains Japanese kanji and the other the equivalent unicode value as a character string. The default character set for the db is UTF-8. A query run with phpMyAdmin returns the correct data. However, I have been unable to achieve this with my own scripts (or to work out how phpMyAdmin is doing this). Instead the Japanese characters (which are all three bytes long) are returned as single byte question marks 🙁 .
An abbreviated version of my script may be seen below.
[INDENT] // Connecting, selecting database
$connect = mysql_connect('localhost', "user1", "testing")
or die('Could not connect: ' . mysql_error());
$db = 'kanjidb';
mysql_select_db($db) or die('Could not select database ('.$db.') because of : '.mysql_error());
// Performing SQL query
$query = 'SELECT kanji, unicode FROM kanjitbl';
$result = mysql_query($query) or die('Query failed: ' . mysql_error());
// Printing results in HTML
echo "<table border='1' width='300'>\n";
while ($line = mysql_fetch_array($result, MYSQL_ASSOC)) {
[INDENT]echo "\t<tr>\n";
foreach ($line as $col_value) {
[INDENT]echo "\t\t<td>$col_value</td>\n";
}[/INDENT]
echo "\t</tr>\n";
}[/INDENT]
echo "</table>\n";[/INDENT]
The script also includes the following : <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> and the resulting pages encoding is confirmed as UTF-8.
I have amended the [mb_string] section of my php.ini file as follows (basically throwing everything at this problem)
[INDENT]mbstring.language = Japanese
mbstring.internal_encoding = UTF-8
mbstring.http_input = UTF-8
mbstring.http_output = UTF-8
mbstring.encoding_translation = on
mbstring.detect_order = UTF-8, ASCII, EUC-JP, JIS, SJIS
mbstring.substitute_character = long;
mbstring.func_overload = 7[/INDENT]
When mbstring.detect_order is set to auto, it tells me that each item returned from the query is in ASCII format. Now it says that it's om UTF-8 format, but doesn't seem to treat it as such :mad: .
Any help with spotting what I have missed, or even completely different strategies for retreiving and displaying multibyte unicode characters would be very much appreciated :o .