Migrating from mssql.dll to FreeTDS

collinlee

Hi,

I have a unique situation and I am wondering if anyone faced these issues and was able to successfully resolve them. Here's the summary. I have an existing application that uses php + apache + SQL Server. This application supports internationalization so there are some multi-byte values in the database that have been stored. The existing application uses the PHP supplied php_mssql.dll to access the database. The input data comes from the web. The data currently stored in the database gets converted when SQL Server does a UTF-8 to UCS-2 translation. For example, a Japanese character string like "サンフランシスコ" sent from the web browser looks like "ã‚µãƒ³ãƒ•ãƒ©ãƒ³ã‚·ã‚¹ã‚³" in the SQL Server database. Everything is fine when I retrieve this value from the database and display it on the web browser.

The problem is now I want to migrate this application to use the FreeTDS drivers for better (and correct) multi-byte character support. My new database table columns will change from varchar->nvarchar, text->ntext, etc. Also, the queries will now have a N prepended to the values [ex: INSERT INTO TEST (id) values (N'サンフランシスコ') ].

What I need to do first though is get the existing data and store it back into the database. Using SQL like SELECT INTO [new_table] FROM [existing_table] will not work because the data encoded already looks like "ã‚µãƒ³ãƒ•ãƒ©ãƒ³ã‚·ã‚¹ã‚³". It seems I have to read this value out and then write it back in, but with the special characters that have been inserted taken out.

Has anyone encountered this problem before? I've tried a bunch of different things:

1) Writing values to a file and then reading it back in to build SQL statments.
2) Using iconv to re-encode data
3) Using mbstring to re-encode data

Still, I am not able to get the data stored correctly.

If this is confusing, I guess the simple way to say it is how do I get the value that is currently stored as "ã‚µãƒ³ãƒ•ãƒ©ãƒ³ã‚·ã‚¹ã‚³" from the existing php_mssql.dll libraries to be stored as "サンフランシスコ" using FreeTDS?

Any suggestions would be greatly appreciated.

collinlee

Okay, not sure if anyone cares to know, but I wound up with a solution. It goes like this:

1) Read out the data using php_mssql.dll driver and write data to files and encoded as UTF-8

2) Copy database schema, but change text->ntext, varchar->nvarchar, char->nchar

3) Read files in and write back data using the ADO extensions. You'll have to instantiate an ADO instance as follows:

$conn = new com('ADODB.CONNECTION', NULL, CP_UTF8);

The queries you use to insert the data must have the N prepended to multi-byte values in the nXXX columns. See the above post for example.

4) After data has been written check to see if it appears in native charset format.

5) If #4 passes, change your php.ini to now use FreeTDS extension and update or add (mssql.charset = "ucs-2") to php.ini file

6) Create a freetds.conf file and make sure you're using version 7.0 or above

7) Modify code so your queries will also use the N prepended to statements