MS Word -> PHP -> XML -> Flash (encoding question)

andrew_embassy

So, I've written a php page that enables me to update an xml file for a staff listing on a flash site.

Trick is, when somebody pastes stuff from MS Word into the "bio" section, it kills the XML document.

So I've been trying to do some mb_convert_encoding() razzle dazzle on it to get it to conform to UTF-8 (the encoding my xml document is in- is this a good coding for XML for this application? Flash sources say Flash works fine with UTF-8), but it's coming back with:

Warning: mb_convert_encoding() [function.mb-convert-encoding]: Unable to detect character encoding in C:\Documents and Settings\graphic1.OLIVERMACHINERY.000\Desktop\AAA\MnC\MediaTech\web\xml\staffEdit.php on line 74

Any ideas?

_theworks

Hi,
Im not sure if i get what you mean, are they pasting directly into xml docs?
you can use [man]htmlspecial_chars[/man] to convert all the non-standard xml chars into things like & or whatever..

andrew_embassy

They're pasting into a form and that form/page is being submitted to itself where I can run some processing functions on it and then I write it to the xml. A few simple things are I take out the <>'s and replace them with []'s etc.

I found a function that somebody wrote that replaces a ton of characters [chr(123)] with their (unicode?) counterparts [{] and after that I seem to be able to run my mb_convert_encoding() function on it to get it into UTF-8.

Does anybody have a minute to explain the different characters to me?

chr(123) - what's the name for this character?  What encoding is it used in?

&quot;/&amp; - how about these?  Are they embedded in html?

&#123; - this is unicode, correct?  Or UTF-8?  Or both?

andrew_embassy

crap it keeps turning my & # 1 2 3 ; characters into {'s

_theworks

Read this [man]htmlspecial_chars[/man]

andrew_embassy

htmlspecial_chars() and htmlentities() don't encompass all the characters that are commonly found in word docs. Writing my own function has worked better for me.

_theworks

heh ok yeh i havnt personally had to experience the characters that MSword outputs.. (thank god)
but i 'think' there are a few more functions other than htmlspecial_chars that do the same sort of thing but then again ms word useses non-standard characters doesnt it? so writing ur own function would prolly be the best way come to think of it