I've got a website in which you enter your details to post ads.
You firstly enter your details into text boxes on an .html page (name, surname, email etc)
You then review these details on a .php page and if you happy with the details the next php page submits everything to the DB.

The problem I have is this: if Polish letters are used in the text field on the html page, they don't show up at all but as "Auml�Auml�Aring�Aring�Atildesup3" on the review php page.

All files (html, php and D馃槑 are set to utf-8 and the review php doesn't have problems displaying Polish characters in the descriptions of the text field data, but the actual data itself has the problems. (I hope I'm making it clear - apologies if not) i.e.
Imię (name): Auml�Auml�Aring�Aring�Atildesup3

An example of the code (if it helps) on the review php page is:

$string1 = isset($_POST['firstname']) ? htmlentities($_POST['firstname']) : false;

if(empty($string1))
{

echo("<h2>Imi&#281;:<font color=red>Prozs&#281; wype&#322;ni&#263;</font></h2>\n");
$showbutton=1;
}
else
{
$firstname = ereg_replace("[^0-9a-zA-Z?&#261;&#263;&#281;&#322;&#324;贸&#347;&#380;&#378;&#260;&#262;&#280;&#321;&#323;脫&#346;&#379;&#377; ]", "", $string1);
echo("<h2><span class=profile>Imi&#281;:</span> <font color=#04B404>" . $firstname . "</font></h2>\n");
echo("<input type=hidden name=firstname value=" . $firstname . ">");
}

Many thanks in advance if anyone can help me out with this

    after mysql_conect run

    mysql_query("SET NAMES utf8");

    that often solves this issue

      Thanks for answering.
      Now I'm going to show you how much of a noob I am
      I searched through each file to find mysql_connect and couldn't find it.
      The pages which do talk to the DB have a databaseconn file which has contact details to the database so I put the command in there, adjusting the place a couple of times trying all possibilities and it doesn't work.

      Not knowing php that well (something of an understatement) I guess i'm having difficulties understanding why this command would help. The reason is this:
      the data entry page is an html page (no connection to the D馃槑 - here the person inputs Polish characters
      The review page is a php page (with no connection to the D馃槑 - here when you're reviewing the field entries made previously, you can't see the Polish letters
      The submit page - connects to the DB

      So in my simple mind, the problem here is not sending data to the DB, because the error occurs before you get that far. It happens when the data is sent from the html page to the review php page.

      All the pages are running charset=UTF-8

      If I've got some major gaps in my logic please accept my apologies

        All of the ereg_* functions have been deprecated and should not be used at all. The same goes for the mb_ereg functions (which were the multibyte versions of ereg).

        Use the corresponding preg functions instead, and specify the u pattern modifier for unicode.

        $firstname = preg_replace("#[^0-9a-zA-Z?&#261;&#263;&#281;&#322;&#324;贸&#347;&#380;&#378;&#260;&#262;&#280;&#321;&#323;脫&#346;&#379;&#377; ]#u", "", $string1);
        

          Thanks for answering. I'm not 100% what the u pattern modifier should be. From the googling I've done, the codes I need are:

          &#261; 0105
          &#260; 0104
          &#281; 0119
          &#280; 0118
          &#263; 0107
          &#262; 0106
          &#324; 0144
          &#323; 0143
          &#322; 0142
          &#321; 0141
          &#347; 015B
          &#346; 015A
          贸 00F3
          脫 00D3
          &#380; 017C
          &#379; 017B
          &#378; 017A
          &#379; 0179
          As regards the string should it then look like:

          $firstname = preg_replace("#[0-9a-zA-Z?&#261;&#263;&#281;&#322;&#324;贸&#347;&#380;&#378;&#260;&#262;&#280;&#321;&#323;脫&#346;&#379;&#377; ]#u+0105, #u+0104, #u+0119", "", $string1); etc etc

          Apologies once more for the question, but I've absolutely no prior knowledge of any of this and just winging it right now.

            $firstname = preg_replace("#[^0-9a-zA-Z?&#261;&#263;&#281;&#322;&#324;贸&#347;&#380;&#378;&#260;&#262;&#280;&#321;&#323;脫&#346;&#379;&#377; ]#u", "", $string1);
            

            The pattern modifier u is allready in there: the 'u' after the last #.

            Documentation for pcre (the preg_* functions) are found here. If you click "Possible pattern modifiers" you'll see the u modifier there.

            The thing is that all those polish letters take up several bytes rather than one. But if you do not use the u modifier, one such character will be regarded as two or more one-byte characters which screws up string matching. So the above php code should allready do the trick for you. Just test it...

              Thanks for taking the time to explain it.

              Tested the code and now the field doesn't show any Polish characters at all. It shows our alphabet, but not and Polish letters.

              When you mix PL letters with 'normal' letters the field still remains blank.

                If you, somewhere on the form page would

                echo '<div>Accepted characters: "0-9a-zA-Z?&#261;&#263;&#281;&#322;&#324;贸&#347;&#380;&#378;&#260;&#262;&#280;&#321;&#323;脫&#346;&#379;&#377; "</div>';
                

                Do the polish character appear correctly?

                If the page processing the form post is not handled by the same file, what is the output of the above in the form processing file?

                  The page with the form on it is an html file. This then populates fields in a php file.

                  I tried the code you suggested in the html file, as a php code embedded in the html file, with no success.

                  Apologies for the amount of time it takes for me to get back to you, my working day has me out for virtually all of the day, so I appreciate your patience and continued help.

                    Ah yes, you can't do that in an html file. But if you do the same thing in the PHP file, what is shown?

                      It doesn't show the polish characters, it just shows

                      "Accepted characters: "0-9a-zA-Z?&#261;&#263;&#281;&#322;&#324;贸&#347;&#380;&#378;&#260;&#262;&#280;&#321;&#323;脫&#346;&#379;&#377; ""

                      On the page.

                      Thanks for your help on this, but I think I better get someone with me to sort it out once and for all

                        Err, what? You say it does display &#261;&#263;&#281;&#322;&#324;&#243;&#347;&#380;&#378;&#260;&#262;&#280;&#321;&#323;&#211;&#346;&#379;&#377; (which I thought were polish characters) which means your file is saved in a file format matching your web page character encoding. But then you claim no polish characters are shown, which would mean that these are not polish characters? Still, they are the only non-us ascii characters in your regexp pattern... What are the characters that do not show up?

                        Also, rereading your initial post, it would seem you are using htmlentities. Have you specified that it should use utf-8?

                          Sorry, my post was confusing.
                          The page can and does show Polish letters - they are entered in html format within php and using the php command print.

                          Where the Polish letters don't show is in the fields where you check your information to see if it's correct or not.

                          eg:
                          html data input form
                          first name: <here's the text field where you put in your name>
                          surname: <here's the text field where you put in your name>

                          you click submit.

                          The php page then lets you review the information in a fixed format (you have to go back to change the details if incorrect) to see if it's correct or not

                          first name: <whatever you entered in html form>
                          surname: <whatever you entered in html form
                          >

                          It's here*** that the Polish characters don't populate the field. So if your name is "&#260;&#281;&#263;贸&#324;&#322;" and you enter that in the html page, on the php page where you review this information you get a blank field

                          If you enter latin characters, these display normally.

                          I hope i've made it clearer and apologies for the vague post before.

                          Everything is set to utf-8, the charsets for the html page, the charsets for the form, the charsets for the php page, when you connect to the database it's set to utf-8

                          I have virtually no knowledge of php other than what I've learnt through trial and error and copying code, so specifying how things like html entities can be set to utf-8 normally takes a google search plus more trial and error. If you could let me know how to set htmlentities to utf-8 I'll try it out and see if that's the problem. TIA

                            Write a Reply...