Hi all,
I have an HTML form with a text area, which submits to a PHP script which in turn does some basic data processing and inserts the form contents into a databse record (PostgreSQL). The form method is post and enctype is multipart/form-data, because there is a file upload control.
When typing into the text area manually, everything works out hunky dory, but when I cut and paste the text from a word 2007 document into the text area, I get the following error:
ERROR: invalid byte sequence for encoding "UNICODE": 0x92
I realise this is because of a mismatch between the encoding of the form input and the database client encoding. I figure I have two options here
a) set the client encoding to match the string's encoding using pg_set_client_encoding
b) convert the incoming form data to match to client encoding which is UNICODE presently (according to the error message)
From what I understand, because the enctype is multipart/form-data, each individual control has it's own encoding, which means option (a) is porbably not faesible, because I would have to change client encoding for each field entered into the database, meaning multiple updates. So option (b) looks like the way to go, which brings me to some questions
1) Is there a way to specify the encoding of an HTML form such that conversion of encodings is done on the browser side, and I get form data in my $_POST variable in a consistent encoding
2) If not, how can I determine the encoding and charset of each part of the form, i.e. each element of $_POST?
3) Once I know the source encoding of each element of $_POST, how do I go about converting it to the encoding the database connection is set to?
Thanks