I have turned on the general_log in MySQL, and it's working well... except for the special characters.

For example, part of my original query which ran successfully, is: ... SET SurveyTitle='ExitRight® Interview'

However, when I look in the general_log table, it reads: ... SET SurveyTitle='ExitRight\xAE Interview'

Why did my ® get changed to \xAE?

At first, I thought this was just a minor annoyance and ran a half-dozen find and replace scripts after I exported from the general_log table. But when I pulled the export from something that had foreign languages, there were FAR MORE occurrences of a special character appearing as \x[something] rather than the special character in the general_log table.

I'm running MAMP 3.3 on my Mac, with MySQL 5.5.42, and phpMyAdmin 4.4.9.

    As to "why", I presume its thinking is that it's writing to a flat ASCII file, so is converting unicode characters to their ASCI escape sequences. As to whether this "feature" can be turned off or overridden, I have no idea (without a bunch of Googling 😉 ).

      Thanks for the insight. I tried a bunch of Googling, and no joy. So let me try coming at this a different way...

      Let's assume my output file just has all those \x[something] characters. Any one know where I could find a PHP script / function that would convert all of those characters for me?

        A little experimenting suggests you might be able to use iconv(), e.g.:

        $utf8String = iconv('ISO-8859-1', 'UTF-8', $stringFromMysqlLog);
        

        (Don't think I've ever used that function before. 🙂 )

          5 days later

          Just curious why you might want to convert this log? I'm not certain how this works, but it's my guess that ® at the input stage is input as UTF-8 and is therefore a multibyte character. The escape sequence \xAE is a way to specify a two-byte sequence in ASCII code. The query with \xAE might run just fine and give you the data you want in your database.

          Character encoding is kind of a funny thing. It takes a certain 'awakening' to realize that some charset underlies any exchange of text.

            7 days later

            I thought I was onto something, but on further examination it doesn't work. I tried the function that NogDog suggested, but it didn't work. Or at least it didn't do what I was expecting. And I've been Googling for a while trying to find a solution before I posted here again.

            Without boring anyone with too many details, I am moving data from one schema to another, and without re-writing gobs of my code, the easiest solution (it seems, other than this pesky special character issue) is to run it and then pull the queries out of the general_log table.

            I tried converting that argument field to ascii_general_ci rather than utf8_general_ci, but that didn't work. In fact, it stopped logging altogether when I did that. So I think I'm back to trying to fix my output. Here's a few examples of what I'm trying to solve for:


            | ASCII | UTF-8 |

            | \xAE | ® |
            | \xE9 | é |
            | \xF1 | ñ |

            | \xBD | ½ |

            I'm hoping to write a function that will convert the ASCII on the left to the corresponding character on the right. Sorry - I'm probably using some incorrect terminology here.

              perpetualshaun;11058851 wrote:

              I am moving data from one schema to another, and without re-writing gobs of my code, the easiest solution (it seems, other than this pesky special character issue) is to run it and then pull the queries out of the general_log table.

              Have you even tested one of these inserts? You may find that it actually does what you want. Even though your db insert statement says \xAE, it may end up inserting the ® char into your database. This all depends on character encoding.. If I may be so bold, I would say that you've got a lot to learn about character encoding. For example, are you aware that these common stages of development assume a character encoding?
              when connecting to a mysql database -- mysql needs to know that when you send it two bytes that you mean ® encoded as UTF-8 and not two single-byte characters in arabic or sanskrit or something.
              when outputting an HTML document -- you have to tell the browser that the binary data you are sending is latin-1 or utf-8 or korean or something. Otherwise, the browser can only make assumptions.
              * when saving a text document -- if you cream a bunch of binary data into a text file, it could be ASCII or LATIN-1 or UTF-8 or vulcan or orcish or whatever.

              Character encoding is a fundamental concept and it rears its ugly head in so many places where you'd never realize it. Weedpacket was kind and patient enough to explain much of it to me some time ago.

              perpetualshaun;11058851 wrote:

              I tried converting that argument field to ascii_general_ci rather than utf8_general_ci, but that didn't work. In fact, it stopped logging altogether when I did that. So I think I'm back to trying to fix my output. Here's a few examples of what I'm trying to solve for:


              | ASCII | UTF-8 |

              | \xAE | ® |
              | \xE9 | é |
              | \xF1 | ñ |

              | \xBD | ½ |

              I'm hoping to write a function that will convert the ASCII on the left to the corresponding character on the right. Sorry - I'm probably using some incorrect terminology here.

              I suggest that before you go converting a log in a way that you don't really understand that you consider the format of this log file. The log file may be written as ASCII -- in which case double-byte characters may get converted to some kind of escaped notation that represents the actual bits and bytes that you want in your db. You might consider connecting to your destination database and specifying that the charset of this db connection is ASCII. You may find that those log queries work just fine. I strongly recommend getting your head around what a charset is and where they come into play. It'll help you a lot in the future.

                You can't write something like [font=monospace]INSERT INTO newschema.tablename SELECT * FROM oldschema.tablename[/font]?

                  Or if the two schemas are identical, why not use mysqldump to dump it to a file and then load it into the new db?

                    Write a Reply...