charset iter

kante

can u give me an url where it is explained how to set up

the charset of the output html page via meta tag
the charset for PHP insert read write from DB
and which charset to use in what case in MySQL

to get a system without any surprise converting string from utf8 to latin1 and back?

someone can tell me why on wikipedia (http://en.gentoo-wiki.com/wiki/Convert_latin1_to_UTF-8_in_MySQL) i read this:
"Since MySQL 4.1, UTF-8 is the default charset....."

and on MySQL query browser and phpmyadmin, every time i create a DB and table the default charset is latin1 ? space issues?

bradgrafelman

kante wrote:
the charset of the output html page via meta tag

[man]echo[/man] or [man]print[/man]

kante wrote:
the charset for PHP insert read write from DB

You can use the "SET NAMES" SQL command as soon as you connect if the default character set isn't what you want (or if you'd rather explicitly set the character set on every connection for portability reasons). See this MySQL manual page.

kante wrote:
which charset to use in what case in MySQL

I personally stick with utf8 for just about everything. I'd much rather be able to insert Unicode characters later on rather than deal with trying to convert latin1 to utf8 or worse, not convert the data one way or the other and end up with various issues.

kante wrote:
to get a system without any surprise converting string from utf8 to latin1 and back?

Why would you want to convert back and forth? Why not just stick with UTF8 for everything?

kante wrote:
someone can tell me why on wikipedia (http://en.gentoo-wiki.com/wiki/Conve...UTF-8_in_MySQL) i read this:
"Since MySQL 4.1, UTF-8 is the default charset....."

and on MySQL query browser and phpmyadmin, every time i create a DB and table the default charset is latin1 ? space issues?

The default charset can be specified at the time MySQL is compiled/installed (or at startup time, in the my.conf file or CLI parameter).

kante

can be considered latin1 a subset of utf8??

bradgrafelman

I guess you could look at it that way. UTF8 only uses multi-byte sequences to represent characters that aren't within the range of what single-byte character sets can handle (e.g. 0-255).

In other words, the string "Hello world!" is exactly the same when encoded in latin1 or UTF-8 since it doesn't contain any characters that can't be represented by a single byte.

kante

... so latin1 it is not a subset... but is a minor set of chars respect to utf8...
so when u try to convert from latin1 to utf8 u still find troubles ?
since latin1 even if it has smaller encoding system still have some chars that can't be converted in utf8?