Hey all,
I have a problem with reading from the local disk in a PHP jukebox app ive been working on for a couple months. Im posting on this forum in the hope that some reader has encountered this issue before and can resolve it once and for all! Searching the web only returned one answer, and that was not a desirable one - wait for php6..
This issue only applies to Linux (Debian etch) and OSX, running PHP 5.2.4. The issues does not appear on Windows.
Im reading a local file structure which contains mp3's and then displaying them in the browser, where the user can click each track to add it to a playlist. There's a java mp3 player part of the app which runs on the same server to handle the audio.
The problem is that directories/files which include non-ascii characters aren't read correctly by PHP. Here is an example, the first 2 lines may be the same - if you copy-paste out into a text editor you will see the difference. The second line denotes the unicode decimals for each string, and the third line is built from HTML entities using the unicode values. This will enable your browser to display a representation what PHP reads from my disk. (edit: this forum wont let me put html into my post, so the 3rd line will not render!)
/mp3/Björk/
66 106 246 114 107
B j ö r k
Is read as:
/mp3/Björk/
66 106 111 776 114 107
B j o ̈ r k
It was my understanding that PHP would directly read binary data off the disk, where I could interpret it as whichever charset I desire. I believe OSX and Debian use utf-8 as their underlying charset. Converting this using mb_string in PHP doesnt work, and I am at a loss as to how this can be dealt with correctly.
I am using utf-8 encoding through out my application, this problem only exists when reading from the disk!
I can also provide some of my PHP test scripts should anyone like to attempt to solve this issue on their local machine..
Thanks for any/all help or suggestions.
mafro