Hello,

Fairly sure this hasn't been asked (recently), so please forgive me if it has.

I am current working on a site which uses SQLite to deliver page content (my hands are tied at this point in regard to db system). I have been asked to implement a search feature "similar to Google". I have read articles on SQlite's FTS capabilities but am still none the wiser - I have found very few examples, and am wondering if anyone could point me in the right direction? I understand I can use LIKE, but it is slow and not recommended.

I have read a few things about FTS1, 2 and 3 with SQLite and this seems the right direction, but does the stock build of SQLite 2.8.17 (PHP 5.2.8) contain them? This is partly rhetorical as I'll find this out quickly when trying it. However, the examples I have seen say that you should create a VIRTUAL TABLE using one of these extensions, populate the table with the data you wish to search, and then preform a MATCH query. Makes sense; I'm guessing the system indexes the data when it is INSERTed/UPDATEd. Now, my question is this: should I add all the searchable data to a FTS2/3 VIRTUAL TABLE when a search is demanded, or should I add/update to the VIRTUAL TABLE when an row is added to the main table (which contains the content to be made searchable)... ?... I am not sure if I understand the idea of VIRTUAL TABLES... If the VIRTUAL TABLE is written to the db file is the same fashion as anyother table, it would make more sense to populate the VIRTUAL TABLE when the main table is added to/updated...?

Has anyone had any experiences of using the sqlite_fulltextsearch class (http://phpclasses.fonant.com/browse/package/2328.html)? Any problems in using it? It seems too simple a system...

Any help or advice would be greatly appreciated,

All the best!

    I've never used sqlite, so I can't give any direct help concerning that. However, I do know that you can setup Sphinx via XML using sphinx's xmlpipe2, so perhaps that's an option. Scroll down to xmlpipe2 on this page for a quick intro on it.

      9 days later

      Thank you for the reply and the suggestion. Unfortunately, my access to the server is limited so I'm having to solve this in stock-build PHP and SQLite only, so no FTS1, 2 or 3, or MySQL, or anything I need to build 🙁. Anyway, I think I've solved it... kinda... Well, it works... ish...

      Here's my approach:

      The table - lets call it 'Pages' - has a 'Title' and 'Body' column which I want to be searchable.

      Rows are added to 'Pages' in the usual way, but also, the 'Body' text is appended to the 'Title' text, all punctuation is removed and the text is split into words. Each word is then added to another table, called 'Search', which has columns referencing the row ID in 'Pages', the word itself and the index the word occurs within the 'Title' and 'Body' columns, ie: Title = 'Hello', Body='Hello everyone my name is dave'; 'my' has an index of 4 as it is the fourth word.

      A select can then be performed on this table, grouped by reference ID and ordered by the count of ids and also ordered by the minimum word index. Basically, it weights the order so that if the search string occurs earlier in the text (effectively, the title), and there are more occurrences of the search string, it will be higher up in the results.

      I have had to limit certain things to make sure it's relatively fast, though I am sure other systems are a LOT faster without the need for these limits. I have limited the indexable content to 5000 words per page - it's for a company website, so I don't think it'll ever need to seriously go beyond that but this is modifiable; just tested it with a massive lorem ipsum and it slows it down slightly ( by 0.1s) but this is more down to how I get snippets and bold the search terms - to fix this, I have had to make sure I only SELECT a certain amount of text from the 'Body' column from 'Pages' and not the whole shebang . It is case-insensitive, and the search is OR based - ie, if the search string is more than one word, it splits and selects WHERE OR. I can easily do something about this, but not weight the results to where it finds words next to each other... Any thoughts? Also, I haven't done anything about adjectives and plurals (-ing, -er and s) but I might be getting too far ahead of myself.

      ta!

        Achhh! Also forgot to say, it ignores words less than four characters in length but there is an exclusion list to this (so TV is still a searchable term).

          Write a Reply...