I have an Excel Doc. I need to covert it to xml. With the first row (title row) as the xml tags.

Here is what I do

1) Save Excel as utf8 csv. (I found that use open office to save the Excel doc as utf8 csv was better than use excel to save Excel as csv.)
2) Use a php script, read in CSV, export as XML. (first row of csv as the xml tag)

It works fine for many excel docs before. But this Excel document I have now, which was generated with old file manager software etc, the data is messed up with many unseen, weird characters, which will cause the xml parse error.

The firefox gave the position of the error, but at that position, there is no data. I have to go to the csv use a text editor, remove the characters which I can see before and after that position and type back these removed characters. Then that parse error is gone. But the parse error position moves to next "unseen" character.

Someone else must experience the same issue too. Could you tell me what are these unseen characters and how to get rid of them?

From the excel, I can see the space there, but these are not regular space character input by space bar. If I can do there, backsapce and space back these spaces, it will fix the error.

Thanks!

    You could try using the [man]MB[/man] functions to convert it.

    $file = 'path/to/file.csv';
    $text = file_get_contents($file);
    $fh = tmpfile();
    fwrite($fh, mb_convert_encoding($text, 'UTF-8', mb_detect_encoding($text)));
    rewind($fh);
    // now do your fgetcsv() stuff using $fh as the file handle
    

    The above could use some error-checking and such to make it more robust, but hopefully you get the idea.

      I tried it. But still I got the error message. The problematic character are invisible.

      I thought I might be able to use Excel, OpenOffice some spreadsheet tools to get rid of these invisible problematic characters.

      But I add your function to my script anyway, it would be useful to convert some other problematic characters in the future in other cases.

      Thanks!

        Write a Reply...