Hello

I am trying to fread a file that has got characters in different languages. Some of these characters are not of any defined character set (like some custom truetype fonts, etc.) and the PECL fileinfo extension shows the encoding as "text/plain; charset=unknown-8bit".

What i need to do in the php file is to load the contents into a variable and do some string replacements.

The thing is if i simply copy the file contents into my php file and do replacements, then it works fine.

But if i do an Fopen, fread, i am guessing that because fopen apparently uses windows-1250 encoding, the characters are getting screwed up.

Is there some way i can do something like

<?php
$string = {include("filename.html")};
?>

Or, alternatively, is there a better way to do Fread on characters that don't have a particular charset

Thanks a million, in advance

V Madurai

    [man]fopen[/man] (rather, [man]fread[/man]) reads pure binary; it doesn't use any particular character set and doesn't do any re-encoding. That's why it is described as "binary-safe".

    So what do you mean by "getting screwed up"? It shouldn't be surprising if they're not being displayed correctly, because if you don't know the encoding how are you supposed to know how they should be displayed?

      Weedpacket;10992993 wrote:

      if you don't know the encoding how are you supposed to know how they should be displayed?

      Indeed. Yet...

      vmadurai;10992987 wrote:

      The thing is if i simply copy the file contents into my php file and do replacements, then it works fine.

      which means that either you or the program you use to open the file to copy its contents know what its encoding is, and you should know, or be able to find out, what encoding you save your php files with. Thus, do the exact same conversion in PHP.

      I would also expect PHP to default to ISO-8859-1, which is close to windows-1252 (not windows-1250), albeit still not identical.

        johanafm wrote:

        I would also expect PHP to default to ISO-8859-1

        Oh, it's simpler than that. PHP doesn't use any encoding. A string is just a sequence of bytes, and never mind what characters those bytes are supposed to represent. That's why there are extensions for converting between encodings.

        Unless you mean which encoding the web server defaults to putting in its Content-Type: headers.

          Weedpacket;10993003 wrote:

          Unless you mean which encoding the web server defaults to putting in its Content-Type: headers.

          Ah yes, I got a bit sloppy.

            Write a Reply...