sql turns &nbsp; into a wierd character: "Â"

Toadums

Hey...So I have a document which I need to use   to space things out.. it pretty much looks like this:

1.1 SECTION INCLUDES
   1. Door release panel.//all the spacing infront of the numbers is &nbsp;
   2. Transformers.
   3. Magnets.
   4. Switches.
   5. Pushbuttons.
   6. Power supplies.
   7. Electrical supervision circuits.
   8. Wiring.
   9. Card readers.
   10. Keypads.
   11. Accessory software.
   12. System accessories:
      1. Readers.
      2. Cards.
      3. Sensing devices.
      4. Power supplies.

when It gets saved into sql, it converts the  's into 'Â'...so it looks like this:

1.1 SECTION INCLUDES
Â Â Â 1. Door release panel.
Â Â Â 2. Transformers.
Â Â Â 3. Magnets.
Â Â Â 4. Switches.
Â Â Â 5. Pushbuttons.
Â Â Â 6. Power supplies.
Â Â Â 7. Electrical supervision circuits.
Â Â Â 8. Wiring.
Â Â Â 9. Card readers.
Â Â Â 10. Keypads.
Â Â Â 11. Accessory software.
Â Â Â 12. System accessories:
Â Â Â Â Â Â 1. Readers.
Â Â Â Â Â Â 2. Cards.
Â Â Â Â Â Â 3. Sensing devices.
Â Â Â Â Â Â 4. Power supplies.

and finally, when displayed in javascript/using php:

1.1 SECTION INCLUDES
&#65533; &#65533; &#65533; 1. Door release panel.
&#65533; &#65533; &#65533; 2. Transformers.
&#65533; &#65533; &#65533; 3. Magnets.
&#65533; &#65533; &#65533; 4. Switches.
&#65533; &#65533; &#65533; 5. Pushbuttons.
&#65533; &#65533; &#65533; 6. Power supplies.
&#65533; &#65533; &#65533; 7. Electrical supervision circuits.
&#65533; &#65533; &#65533; 8. Wiring.
&#65533; &#65533; &#65533; 9. Card readers.
&#65533; &#65533; &#65533; 10. Keypads.
&#65533; &#65533; &#65533; 11. Accessory software.
&#65533; &#65533; &#65533; 12. System accessories:
&#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 1. Readers.
&#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 2. Cards.
&#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 3. Sensing devices.
&#65533; &#65533; &#65533; &#65533; &#65533; &#65533; 4. Power supplies.

as you can tell...extremely ugly 😛

So my question is:

how can i make it look normal? I tried:

echo str_replace("Â","&nbsp;",mysql_result($result,0,"Data"));

but that didnt work...it doesnt recognize the symbol 'Â', and when I try to replace the 'Â' with a '�' in the str_replace, it is just a square in php (invalid character...) ugh..

Is there a way like when using md5 to 'de-encode' the Â's or something??

if this doesnt make sense, let me know...

thanks!!

sneakyimp

My guess is that the problem occurs when you store the text into SQL. SQL shouldn't have any problem storing the string ' Some Text' into a varchar or text field. Do you properly escape your input string when you run the query that stores the text (e.g., using [man]mysql_real_escape_string[/man])? Is the text pasted into a browser or something? Or pasted into an DBMS tool of some kind?

Weedpacket

As sneakyimp suggests, it looks like an encoding problem. Somewhere along the way the   is being converted into a chr(160) character (no idea why that would be done, so that might be where the problem starts.)

That character is being stored as a UTF-8 code - two bytes: chr(194) chr(160). That depends on the database setting (if you're going to serve ISO-8859-1 text, storing it as ISO-8859-1 text is probably the smart thing to do).

And then those two characters are being displayed as ISO-8859-1 (where they encode the two characters "Â "). Chances are good that this is because the browser was never told that the page is UTF-8 encoded.

Toadums

Weedpacket;10926433 wrote:
Chances are good that this is because the browser was never told that the page is UTF-8 encoded.

yeah, I never quite understood the UTF-8 stuff when setting up sql stuff...lol

is there a way that I can tell my browser to use the UTF-8 setting?

I dont fully understand what the mysql_real_escape_string function does..would I use it when getting data out of sql, or saving it?

thanks for the help! hopefully it isnt too hard to fix >.< hehe

sneakyimp

You can tell everyone's browser what your page encoding is by putting a meta tag like this in your page:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

Also, when you create a database table, that db table has a character encoding to it as well. If you don't specify which encoding, your DBMS will probably assume some default encoding.

Toadums

sneakyimp;10926498 wrote:
You can tell everyone's browser what your page encoding is by putting a meta tag like this in your page:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Also, when you create a database table, that db table has a character encoding to it as well. If you don't specify which encoding, your DBMS will probably assume some default encoding.

Oh, thank you very much!! never really understood what the <meta /> thing was all bout...it makes sense now that I read it...I always just skipped over it, copy and pasted 😛

but low and behold (correct phrase??) ... the charset was set to: iso-8859-1 🙂

which charset should I be using? sql defaults to UTF-8 and my meta thingy has it set to iso-8859-1, so should I use the SQL default?

thanks!

Weedpacket

Toadums wrote:
but low and behold (correct phrase??)

Toadums wrote:
which charset should I be using? sql defaults to UTF-8 and my meta thingy has it set to iso-8859-1, so should I use the SQL default?

Either way it pays to be consistent. Personally I've adopted UTF-8 for all text encoding (database storage, page rendering, source code, editor setting...) but that's just my own policy.

If you don't expect you'll ever (or only rarely) need the full Unicode range of characters then ISO-8859-1 should suit you (one advantage is that every single character encoded in ISO-8859-1 is encoded as a single byte; UTF-8 sometimes needs two or more bytes per character - in this case, for example, two. strlen() counts bytes - it can be wrong about the number of characters). If you do occasionally need characters not covered by ISO-8859-1, you can use named entities if they're available, or numeric references if they're not.

Or of course you could restrict yourself to pure ASCII, in which case you'd use   for a non-breaking space (since ASCII doesn't have an encoding for a non-breaking space).

blackhorse

I want to use utf-8 for my pages, database, files etc. too. But I would have to grab data from 3rd party's database, and so far, these 3rd party database are in latin1 etc.

I would have to select data from the 3rd party database and insert into my database. If the 3rd party database/tables is latin 1 and my database/tables is on utf-8, what should I be aware or do when I select data from theirs and insert into my database?

And make the situation a little complicated, the data in 3rd party database, some times it is in html entity already, some times not. Some times, it is unicode htm entity.

How can I make my output no funny garbage characters pop up? There are so many decode, encode, convert functions available. How to use them to not conflict each other and cover most cases I mentioned above?

Thanks!