Full text-search inside pdf-files?

Anon

Hi,

I'm looking for an (if possible!) php-script, that makes a full text search inside pdf-files possible. On our server (linux-server with apache and php3) we have round about 3.500 pdf-files. I wish to implementate a service that gives our support-team the function of a search-procedure to search in all directories and all pdf-files.
Is this with php possible?

Thanks

Erik

Anon

This is possible, but not directly through PHP. You need to set up an external search engine. Take a look at
UDM search - a very comprehensive opensource search engnie. Its at: http://search.mnogo.ru/

HTDig might be another option.

Anon

ht://dig works very well, although it works best with an external parser included with the XPDF package. There is a nice perl script included with it which extracts the document information from the header portion of PDF files as well. If you are using ht://dig, make sure you set your maximum doc size to the size of your largest PDF file. This may take longer, but it will otherwise abort on that particular file and just move on (it can't parse partial PDF's).

ht://dig can also be directed to export its word database to an external, parseable file for other program's use...you could certainly use PHP to then wade through that file, but you wouldn't get quite the other features of ht://dig in the process.

Check out the documentation on www.htdig.org.

Anon

Did you find a solution? I have the same problem. I have a server running on Linux with a lot of PDF files. I need to do a text search of these files and the Adobe site does not offer a Linux version of their viewer that includes search.