hey guys-

I am doing a face-lift for a client on their website. In addition to the face lift, they want to put their old newsletters on the site in a searchable format.
There are litterally 10,000 pages which have already been scanned into pdf format.

I have read all I can find on pdf2ascii, and none of it makes sense to me. The other convers i have found work great- but i don't have all the time in the world to sit there 10,000 times and create html pages out of these. What I want to build is a pdf search engine without a database if possible. Just a simple search and find by the inputted search string. speed is not a big issue to me or the client- they just want it to work.
most of all, they don't want to pay me my hourly rate for 3 years trying to get all these things online and searchable.

Is there ANY way at all out there that i can wirte a simple script (form and a handler) that will browse ALL the pdf's in a folder and return results?

I have no clue how to even start this- all i know is that i DON'T want to do it manually.

any tips, links, programs, anything would be of great help.

    i read the documentation on hit dig, and pardon my lack of unix knowledge, but how the hell do i install it?
    I pay a web hosting comany for space like most people. so i don't have root access to do what its asking.
    can i simple un gzip it on my windows machine, upload it into a new folder via ftp and go from there?

      Write a Reply...