Hello,
Is it possible to retreive data from a Word Document using PHP. I have searched around and I can only find information on creating a new Word document using PHP. Surely if you can create a word document you can also read one??
Thanks
Hello,
Is it possible to retreive data from a Word Document using PHP. I have searched around and I can only find information on creating a new Word document using PHP. Surely if you can create a word document you can also read one??
Thanks
stonefish wrote:Hello,
Is it possible to retreive data from a Word Document using PHP. I have searched around and I can only find information on creating a new Word document using PHP. Surely if you can create a word document you can also read one??
Thanks
Word documents look pretty inscrutable with all the formatting controls, but the text content can be isolated and retrieved with appropriate regular expressions in php.
I have figured it out, if anyone wants to know I can post all the code on here
Hi,
I would be interested in that script. I have also searched MS Word parser quite a while now...
Can i assume that a word parser would only function on a Windows-based server?
I would assume so, I am using a windows based server
I use the following, but remember reading a word doc is much slower than reading a text file -
<?php
$doc = "http://domain.com/sample.doc";
$word=new COM("Word.Application") or die("Cannot start word for you");
$word->visible = 0;
$word->Documents->Open($doc);
$nr = $word->ActiveDocument->BuiltInDocumentProperties->Count;
$title = $word->ActiveDocument->BuiltInDocumentProperties["Title"];
$author = $word->ActiveDocument->BuiltInDocumentProperties["Author"];
$subject = $word->ActiveDocument->BuiltInDocumentProperties["Subject"];
$bytes = $word->ActiveDocument->BuiltInDocumentProperties["Number of bytes"];
$age = $word->ActiveDocument->BuiltInDocumentProperties["Last save time"];
$pages = $word->ActiveDocument->BuiltInDocumentProperties["Number of pages"];
$nr = $word->ActiveDocument->Paragraphs->Count; # Loop through the word documents text
for($i = 1; $i <= $nr; $i++){
$text .= $word->ActiveDocument->Paragraphs[$i];
}
echo "Count - ".$nr."<br><br>";
echo "Title - ".$title."<br><br>";
echo "Author - ".$author."<br><br>";
echo "Subject - ".$subject."<br><br>";
echo "Size - ".$bytes."<br><br>";
echo "Age - ".$age."<br><br>";
echo "Pages - ".$pages."<br><br>";
echo "Text -".$text."\n";
$word->Documents[1]->Close(false); ## Close word and disable Prompt to Save
$word->Quit();
?>
MSDN has a list of all the objects you can access in Microsoft programmes like word, excel etc. the link is -
I hope this helps. It is really hard to find useful information in COM functions.
Using OLE automation in a server application is inadvisable, and will probably eventually kill your server.
OLE automation of MSOffice apps aren't designed for use in non-interactive applications. If any error happens, a dialogue box will be produced and the thread will then hang. This will probably then lock something so that all subsequent attempts fail, creating more and more copies of winword.exe
This will continue until either memory is exhausted or the web server has all its threads locked. This will break the server.
This will ALWAYS happen, no matter what you do to try to avoid it.
Consider using a third party MSWord reading component instead. They are more robust.
Mark
Hmm.. I'm actually looking for MS Word reader for Linux-platform (PHP4, MySQL). There are a lot of word processors (OpenOffice etc.) for Linux that can read Word-format so it shouldn't be too hard.