Recently I wrote a search engine that searches a website, that contents is stored into a MySql database. Actually till recently it worked fine until the database started to grow to a larger size.
The problem is as follows:
I scan all articles (html format) in the database and extract all the words. Each word is referenced and then stored into an array.
First I read every article in the database and store its contents in an array e.g $content[2] = ‘some very long text words’ // contents of page id 2
, then from each article extract all the words and store into another array e.g.
$word['hello'] = Array('1','2','5','6') // word 'hello' is contained in page id: 1,2,5,6
$word['world'] = Array('1', '5','12') // and so on...
Now that database contains about 40 articles and each word is stored into array, the process of scanning the article and extracting all words it is consuming 100% of the CPU time, failing to generate intended 'Index files' that are used when users query a website search through a form.
Computer Specifications: 1.1GH, 240Mb of RAM
I would greatly appreciate any comments or solutions to this problem.
Many thanks in advance.