where did you see that php 4 couldn't handle unicode?

    http://us2.php.net/manual/en/language.types.string.php

    It says each character is treated as a byte, so how am I able to use multi byte characters without using any special functions.

    I've read quite a few articles and threads on utf-8 and unicode and I still don't have a firm grasp of it so yeah. Any info would be helpful.

      Upon further examination, I think I know what how this is working.

      It's storing them as 2 single bytes (ISO-8859-1) characters but since i have the charset set to utf-8 on my html document my browser is converting certain adjacent bytes to unicode. So behind the scenes (PHP and MYSQL), nothing is in unicode. The way my browser is displaying the characters gives it the appearance that it works fine. Am I correct?

        Yeah, that sounds more right... let me test....

          Hm, it was my understanding that full unicode support for php wasn't going to be pushed out til version 6. I'm almost certain of that.

            Well, it is right. The HTML interpreter (browser) is displaying the characters properly according to your documents character set.

            You can use this as a test document:

            <?php
            
            echo '<html>
            <head>
            <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
            </head>
            <body>';
            
            echo urldecode('%D7%A3%D7%A8%D7%A7%D7%AA%D6%9F%C4%82%E2%80%9E%C8%98%E2%80%9D%C4%98%C5%82%E2%82%AC%CC%BF%CD%8D');
            
            ?>

            That should give the output:

            &#1507;&#1512;&#1511;&#1514;&#1439;&#258;„&#536;”&#280;&#322;€&#831;&#845;

              You can use Unicode data, but PHP's functions have trouble handling it if you want to do something with it. Try passing any unicode string to strlen() or substr() or any other string function. The results will not be what you expect. So until PHP 6, if you want to process and transform unicode data (and not just pass it around from the database to the browser), you have to use mb_string extension or PCRE in unicode mode, which limits what you can do with unicode data.

              So yours works fine in the regard that you are just passing unicode data around.

              However, your string lengths are wrong. Take for example I just entered in "åçé" and while it displays fine (that has never been an issue), it tells me it has a length of 6 which is wrong. It is 6 bytes perhaps, but the string length is only 3. See? That is why PHP does not fully support unicode.

                Write a Reply...