I have a document database that is essentially just a tree of directories that holds .htm and .html files. The search script opens up all files in a certain directory and searches each one for keywords, then returns scored results.
I want the script to disply like following:
X. TITLE
DESCRIPTION
I need help pulling out the title and description from the .htm files.
here is my current code, which isn't working:
$fp = fopen($file, r);
$fread = fread($fp, filesize($file));
fclose($fp);
$fc = preg_replace("/(<\/?)(\w+)([^>]*>)/e", "'\\1'.strtolower('\\2').'\\3'", $fread); // make html tags lowercase
$title = preg_replace("!.*?<title>(.*?)</title>.*?!is", "$1", $fc); // get the title
$raw_body = preg_replace("!.*?<body.*?>(.*?)</body>.*?!is", "$1", $fc); // get body from file
I'm almost positive that the code is not in a logical order. I am teaching myself PHP and am having a bit of trouble with preg_match (is there somewhere that has a good tutorial for all the switches, etc??)
here is what I'm trying to get the code to do:
1) Open the file and read the entire contents - make all <> tags lowercase
2) Remove the text between <title> and </title> for $title
3) Remove the text between <body *> and </body> for $raw_body
4) Remove the HTML tags from $raw_body so it is now just plain text ($body)
After that I go ahead and search everything and format the output.
If possible, I also need to know how to grab 75 characters both ways from a found term (for 150 char description).. I.e., if someone searches for "help" and help is found in $body, I need 75 characters before "help" and after "help" for the description.
Thanks for any and all help anyone can provide!
tgmsocal