Hi folks.

I am helping a friend with a php script which he is using to search directories for contents within pdf files. I am no programmer but I am quite good with Linux and he is not too bad with php.

Anyway, the script is triggered through a web interface whereby it will search each and every file for specific strings or items. I think it would really suck to do it that way but he is helping someone else with this and that person is insisting on using a beagle search for this purpose.

So my questions are:

  • can php call beagle or beagle-index to perform this search?
  • how would php obtain the results of the search?
  • is there a simpler way to do this and where we I or my friend read up on this?

Thanks for any ideas on this.

griz

    I've never used beagle so I don't know if that's better or not. If you can run beagle from a command line, then you can call beagle from PHP.

    Personally, I use swish-e. http://swish-e.org/ It's lightning fast, can read PDF's and can handle all the cool Google style searches [ parentheses | and | or | not ]. It takes about 30 minutes to set it up the first time. You can do it in about 3 minutes once you get the hang of creating an index. It does all its queries on its own index so you don't have to read every file on your site every time someone does a search. With a little PHP programming, you can make Swish-e do queries that return ID numbers for rows in your MySQL database so when you store large quantities of content in your DB, you can search it faster than MySQL can since it's just reading its own index.

    Oh yeah, and it's OSS.

      Write a Reply...