I need to output a file with UTF8 text only without any extra characters or tags.

My code includes source files which have Byte Order Mark bytes since they are UTF-8.
clarification: the main php file for this is saved as utf-8 without BOM so it does enter a BOM bytes, the BOM files only come from included files.

I cannot save the included files without BOM since it will mess my languages display.

How can I supress the BOM bytes ?

I tried
@include_once ('file with BOM"); but it did not work. The BOM is there.

I also tried

ob_start();
@include_once ('file with BOM");
ob_clean();

It does not work also.

What can I do?

Thanks!

    nbd wrote:

    I cannot save the included files without BOM since it will mess my languages display.

    rthat does not make sense to me. I generally save ALL my files as utf-8 without BOM in order to allow for multi-language sites. Allowing even a single file to have a BOM will show through in your site.

    Just save all the files as utf-8 without BOM

      Just to clarify, the BOM is optional in UTF-8; it's only required in other Unicode character sets.

      Anyway, if it cannot be removed, then you would either need to do a file_get_contents() if there is no PHP code to be executed, or else use ob_start() / include() / ob_end_clean() to save the output text to a variable. Then you could use str_replace to pull out the BOM's.

        leatherback wrote:

        rthat does not make sense to me. I generally save ALL my files as utf-8 without BOM in order to allow for multi-language sites. Allowing even a single file to have a BOM will show through in your site.

        Just save all the files as utf-8 without BOM

        Thanks for the answer.

        Unfortunately when I change my hebrew.php (hebrew texts page) from utf8 to utf8 without BOM all the Hebrew text becomes corrupted.

          NogDog wrote:

          Just to clarify, the BOM is optional in UTF-8; it's only required in other Unicode character sets.

          Anyway, if it cannot be removed, then you would either need to do a file_get_contents() if there is no PHP code to be executed, or else use ob_start() / include() / ob_end_clean() to save the output text to a variable. Then you could use str_replace to pull out the BOM's.

          Thanks NogDog,

          I believe that the file_get_contents will not work since the included file includes a set of string definitions in various languages that I need to access from the rest of the main code.

          I'll try the ob_end_clean option.

            You may need to look into the mbstring extension for manipulating the multi-byte character sets.

              2 months later

              The solution (for this and many UTF-8 related problems):

              add ini_set('default_charset','UTF-8'); to the top of the php code. or add default charset = utf-8 in the php.ini file.

              Apparently PHP is outputting ISO-8859-1 as default. Therefore the hebrew was garbled.

              Hope it helps.
              -Nbd

                Write a Reply...