If you are serious about providing relevant search results at high speeds, you should look at search indexers. There are open source solutions which should give you results in a fraction of a second for the amount of data you describe. Lucene and Sphinx are just two options. Google around and read up on different solutions. Apache SOLR (which uses Lucene) might do the trick for you, since you can query it over HTTP using XML or JSON.
Installing and configuring new things may take some time to do though. As such, you might try what I describe below instead since it should require little work time to tets. But I don't know if it'd improve the speed at all, so you'd have to try it out and see what happens. Also note that once you've processed all data as outlined below, you'd have to do this every time a new text is stored. But if it gives fast enough search results, then the time it takes to process the text once should be ok.
- Go through all texts that needs to be searchable. Break it up into individual words. Store all those words in a separate table words, where "all" most likely need not be all words. For one, you can omit any word of 3 chars or less, at least unless you lower the default 3-character threshold (on MySQL, check docs for others).
- Create a text_words table in which you place word_id, text_id, word_count.
- Whenever someone performs a search, use the words table, join with text_words and join with texts. Pull out a decent subset of texts_ids then join this on the text table. That is, along the lines of
SELECT id, title, body, MATCH(title, body) AGAINST ('the words here') AS score
FROM (
SELECT text_id, SUM(word_count) wordcount
FROM words w
INNER JOIN text_words tw ON w.id = tw.word_id
WHERE word IN ('the', 'words', 'here')
-- AND word_count > ???
ORDER BY wordcount DESC
LIMIT 50
) tmp
INNER JOIN texts t ON t.id = tmp.text_id
WHERE MATCH(title, body) AGAINST ('the words here')
ORDER BY score DESC;
Do note that there is no weighting of keywords here in the inner query, although it may be added as well of course. There is also no differentiating between title and body in the inner query when it comes to weight. This may also be added if necessary. You'd have to try it out yourself, but it's possible that this approach would get good enough results on the inner query while sufficiently cutting down on data for the fullt text search. You may also want to tweak the row limit for the inner query.
Another thing you might do is start by retrieving max word count for each word in the search.
SELECT MAX(word_count) AS wordcount, word_id
FROM words w
INNER JOIN text_words tw ON tw.word_id = w.id
WHERE word IN ('the', 'words', 'here');
Then use those max word counts in the inner query by dividing it by 2 or possibly even a lower number. The reason is that if max word count for some word is 100 in a single text, then it's unlikely you need to search all texts where it exists 5 times. Dividing by 2 would start checking in texts where it is found 50 times. Inner query modified as follows
SELECT MAX(word_count) AS wordcount, word_id
FROM words w
INNER JOIN text_words tw ON tw.word_id = w.id
WHERE word IN ('the', 'words', 'here');
This will require an INDEX (word, word_count) on the words table.
If the cutoff is 5 for 'the' and 10 for 'words', then a text with 12 'words' and 4 'the' would only have a wordcount sum of 12, since 'the' would not be retrieved due to a too low word count.
If this approach would give enough speed but somewhat lacking in relevance, you could fire off a separate process which perform the same query using nothing but built in full text indexing. This would be used on subsequent queries (2nd page etc), which might lead to slightly differnt search results the other time around.