Hi Justin
I have been implementing something like this for html that is purely database-fed. I was thinking of the hybrid approach and decided it was too messy, especially as I wanted the search engine to be able to highlight the response.
Anyway, Clay Johnson has an article on one way to do this on the phpbuilder site:
http://phpbuilder.com/columns/clay19990421.php3
If you do want to handle existing html, you could kludge together an indexing engine that basically indexes a predefined list of urls via the web-suck feature of fopen() or file(). In pseudocode:
for each $url in url_list
{
fopen($url) save in $str
$array=explode(striptags($str))
for each $word in array{
if (is_significant($word){
insert (url,word) into db.table
}
}
}
stick this in a cron (either using the php executable or a wget) and you're laughing
... and I just found fgetss() which does the striptags step for you.