Hi all

When I post my form I run a query to store the posted information to the database as such:

$qry = "UPDATE mytable
SET overview = '".htmlspecialchars($overview, ENT_QUOTES, "UTF-8")."'";

Now when I type pound signs into my content they get stored as such:

£

But when I display these in my web page they come out as black diamond question marks which suggests the characters are unreadable.

I extract and show them as such:

echo html_entity_decode(stripslashes($row['overview']));

[/code]

Why will the pound signs not display correctly?

Thanks for reading.

    There's really no reason to be converting data when inserting it into the database. Store data in its "raw" format and only sanitize/change/whatever it when displaying. If you're concerned about security you should be using a function like mysql(i)_real_escape_string (for string data) or prepared statements.

    Also double-check your character encoding to make sure it's consistent.

      I am using htmlspecialchars() because this webiste support many different languages. The CMS allows users from around the world to publish their own content in their own language and depending on where a visitor IP address is based, that language is delivered to the browser from the MYSQL database. The languages cover German, Korean, Chinese and Spanish.

      As far as I have understood, I need ot use htmlspecialchars() to convert umulats and other special characters used in languages other the british.

      Does this make sense? Does it help?

      Bonesnap - your comment about storing strings as 'raw' and then sanitize them on page output - am I right in understanding the way I am doing it is correct for special characters ni foreign languages? I.e. I have to convert them before saving them into my database?

      My database is UTF-8 and so to are my web pages.

        condoug wrote:

        As far as I have understood, I need ot use htmlspecialchars() to convert umulats and other special characters used in languages other the british.

        No. Read the PHP manual on [man]htmlspecialchars[/man]. You might be thinking of [man]htmlentities[/man] instead, but then since you are just dealing with UTF-8 this should be unnecessary.

        condoug wrote:

        I have to convert them before saving them into my database?

        No, the idea is to store the data as-is, then format as needed, e.g., to display in a web browser as text. Hence, you should use htmlspecialchars only after retrieving the data from the database: when printing it. So, the use of htmlspecialchars is to do things like prevent HTML code from being interpreted as HTML but as text.

          OK, back to basics for me then.

          So the rule of thumb is:

          1) Save the data to MySQL using mysql_real_escape_string() and nothing else, this would actually save the characters (as they have been typed) directly into the database. So an Umlaut will be saved as Ü in the database and not Ü

          is this correct?

          2) And then when I retrieve the data from the database I output it as follows:

          echo htmlspecialchars($output, ENT_QUOTES, "UTF-8");

          Is the above correct?

          Thanks for your advice.

            condoug;11005103 wrote:

            1) Save the data to MySQL using mysql_real_escape_string() and nothing else

            Not exactly, since mysql_real_escape_string is meant to be used on string data only (hence the string in its name). If you're inserting numerical data, such as an integer, cast it to its appropriate data type.

            condoug;11005103 wrote:

            So an Umlaut will be saved as Ü in the database and not Ü

            is this correct?

            Yes, that is correct.

            condoug;11005103 wrote:

            2) And then when I retrieve the data from the database I output it as follows:

            echo htmlspecialchars($output, ENT_QUOTES, "UTF-8");

            Is the above correct?

            Thanks for your advice.

            Since you're storing your data in UTF-8 and your web pages are UTF-8, you shouldn't have to use htmlspecialchars (it doesn't convert those types of characters anyway) at all; you can just print your data.

              Bonesnap wrote:

              Since you're storing your data in UTF-8 and your web pages are UTF-8, you shouldn't have to use htmlspecialchars (it doesn't convert those types of characters anyway) at all; you can just print your data.

              It is a different issue now:

              condoug wrote:

              The CMS allows users from around the world to publish their own content in their own language and depending on where a visitor IP address is based, that language is delivered to the browser from the MYSQL database.

              So, this data is user input. Therefore, unless you explicitly want to allow the user to say, insert clientside scripting because they can be trusted not to perform XSS, the use of htmlspecialchars is appropriate.

                Oops, I must have missed that. You are definitely correct; htmlspecialchars should be used.

                  Write a Reply...