Has anyone run across a usable PHP script for converting Word documents to HTML (preferably) or plain text (if all I can find) that runs on UNIX/Linux (i.e., it cannot depend on COM functions)? The only thing I've run across so far is the wvWare C library and a handful of scripts that utilize it. This is a bit more complex of an installation issue than I'd care to impose upon my client, so I'm hoping there's something a bit simpler and stand-alone.
Convert Word Doc to HTML/text on UNIX/Linux
I am fairly positive that you can use openoffice.org along with the commands to open the word document and then export it as an html document or xhtml document.
Hope it works for you.
I think that OpenOffice requires some kind of GUI support so if your linux box lacks that you might be out of luck. I could be wrong. OpenOffice.org forums might be a good place to try and find out how they parse word docs.
I'm downloading Open Office now to see if I can do anything with it locally, pending finding out if it's even available (or could be installed) on my target host. Unfortunately, I can't guarantee that I can install anything that would require sysadmin type installation.
Is O.O. a standard part of most Linux web server installs these days?
I don't think it's standard unless you're installing Ubuntu or KNoppix perhaps and possibly not even then. Surely there is some way to run O.O.o from a command line. On the other hand, you probably need admin privleges to install it on a server? Let us know what you find out.
Right now it's looking like a lot more work than it's worth (i.e.: what I'm getting paid ). Seems hard to believe there's not a stand-alone script out there that does this, but I'm sure not finding it if there is.
You may find some of these google results interesting. Particularly this one.