Hi guys,

Hoping someone can help as I have come to a standstill!

I have a PHP upload script working which uploads a Word document (for arguments sake) and adds the Word content to a database.
This all works correctly except for when viewing the Word content now stored in the database, as HTML - it shows all the extra characters that Microsoft adds in...

Can anyone help me to get rid of these characters in the viewing process?

An example of the characters that Microsoft have added in are:

�������
������
��$����������f:��j���

Any help gratefully received!

Thanks!

    doc is a binary format - which means you cannot serve it directly, because it contains characters that can't be displayed. You need to extract text from doc file. Search google for php msword to text.

      Thanks Wilku,

      I have been searching on and off for weeks now and I don't know if I am just fixed on a certain search string that isn't bringing up the right results or what!

      I have found plenty of applications to run documents through...and the details of how to upload files to a server but I just can't seem to find the one that will clean the code either part of the upload or just for viewing/editing...

      I am very frustrated and could really do with some help in finding some code to help me!

      Thanks!

        Use the [man]COM[/man] extension...

        <?php
        // Open a new word document
        $word = new COM('word.application');
        $word->visible = 1;
        $word->Documents->Open('C:\\d.doc');
        
        echo 'Old Word Text:<br />' . $word->Text;
        $old_text = $word->Text;
        
        // Do whatever manipulation you want here...
        $new_text = $old_text;
        
        // Now we'll write it again...
        $word->Text = $new_text;
        
        $word->Documents[1]->Save();
        $word->Quit();
        $word->Release();
        $word = null;
        ?>

        Hope that helps... oh, and just from a quick google using "open word doc +php com" I got a nice PHPBuilder article 😉

          Thanks bpat1434!

          I will give this a go! Like I said, I think I got a bit of tunnel vision with my search!

            Ok, I have read about the COM extension but it says it is only for Windows based PHP - I am on *nix box...is there an alternative?

              Let me ask you this: Can microsoft windows run on *nix? No. I don't think you can actually create a proper word document (except maybe word 2007's .docx document) in PHP without having Windows installed. Word is a binary format, and as such, has some style and format info at the top.

              You have two other options:
              1.) Require users to copy the text from MS Word to a text-input box then upload the document
              2.) Have them save the document in rtf format

              Unfortunately, without hacking MS Office, it's really not possible to do this on a *nix system that I know of.

                I may be being a bit of a pleb - but all I am doing is on the upload opening the document and transferring the content to the database. Is it there I am having the problem as the *nix box cannot distinguish the Microsoft code?

                  Microsoft saves teh document in binary form and has some information in the beginning to define stylization and format. Since nix can't natively read it, you either need an interpreter (OpenOffice.org might* have what you need) or use a different format.

                  The trouble you're having is interpreting a document that isn't supported by the OS.

                    as wilku said, it's a binary format... you could store the binary files in a database as BLOBs (binary large objects). Here's a tutorial on storing and retrieving binary files from MySQL:

                    http://www.devarticles.com/c/a/MySQL/Blobbing-Data-With-PHP-and-MySQL/

                    The browser will know how to handle the MIME types that come back from the server, regardless of the platform, depending on what applications are specified on the client machine.

                    Now, if you actually want to convert Word docs to something else server-side, the above solutions may be for you.

                      Thanks guys!

                      Your help is really appreciated!

                        My host has agreed to install wvware at first asking so hopefully this will solve my issue!

                          Write a Reply...