I'm looking to edit a ms word document on a linux server.

I know, I know- cant be done. Well....

If you open up a .doc file in a hex editor, you can change the values of the text without corrupting the document. How could I mimic how a hex editor reads and edits a file in php?

    but i think you can convert msword DOC's into XML with open office?

      Read in the file and alter it; all a hex editor does is lets you look at the contents of the file without any processing (beyond translating bytes into pairs of hex digits).

      Of course, you have to know what your edits mean to the application (Word) that the file is intended for....

        I've tried reading the file in (with and without 'b' to force binary) and altering it, but the word doc gets corrupt. Oddly enough, when i do the same thging via a hex editor, it changes the text without corrupting the document.

        why would that be?

          I wouldn't know if I don't see how you're altering it.

            OK, i figured out I wasnt reading in the file properly. When i read the file in chunks instead of all at once, it worked.

            I'm gonna run some more tests to make sure it can handle all types of text editing without killing the file. I'll get back to you.

            thanks

            <?
            
            function readfile_chunked($filename) 
            {
               $chunksize = 1*(1024*1024); 
               $buffer = '';
            
            
               $handle = fopen($filename, 'rb');
               $handlew = fopen("testme.doc", 'ab');
            
            
               if ($handle === false || $handlew === false)
               {
            
               return false;
               }
            
               while (!feof($handle)) 
               {
                   $buffer = fread($handle, $chunksize);
            
               $buffer = str_replace("the", "***", $buffer);
            
               fwrite($handlew, $buffer);
               flush();
               }
            
              fclose($handle);
              fclose($handlew);
            
            }
            
            
            echo readfile_chunked("test.doc")
            
            ?>
            

              I'm using str_replace in the code above but that is case SENSITIVE. I cant use str_ireplace, b/c my server support PHP5 yet.

              I've tried eregi_replace, but doesnt work on my file buffer.

              For example, if I do a simple test below, it works fine.

              $str = "HELLO WORLD";
              $str = eregi_replace('world', '*****', $str);
              

              When I try the same thing on the code in the previous post it doesnt work. Any thoughts as to why?

                Try using preg_replace('/world/i', '*****' $str)

                The POSIX [man]regex[/man] functions are not binary-safe, the [man]PCRE[/man] functions are.

                  preg_replace works, but is case-sensitive.

                  Is there a regular expression I can use that says "this exact string in any case", or am i going to have to dynamically create the pattern for each string?

                  ie. /[wW][oO][rR][lL][dD]/ <-- would be a pain to generate this pattern for each string i look for. not too efficient.

                    Write a Reply...