building a better search function

schwim2

Hi there everyone!

I've written an e-commerce platform and it's time to make the search function more robust(it's currently just a bunch of %like% searches on various fields). I've got some thoughts on the matter but wanted to see if anyone might have some better ideas concerning it's design.

My fuzzy plan:

I would use a table specifically for word matches, and it would have the product id, word and weight cells. I'd change the product insertion to go through the various fields(part number, description, specifications, etc) finding the unique strings to insert into the search table, once it' finds a duplicate of a string, it would increment the previously inserted row by 1. strings like the part number would get an instant weight of 100 or similar to make sure it would go to the top of the list.

On the search side, a search would create an array of part ID's that match the search. The array would get sorted by weight DESC and then saved to a database with a unique ID so I could paginate through it for the user.

This is my (very) rough idea of how to handle this and I may be way off base on the best way to handle this so I thought I'd ask if there's a more accepted method of handling something like this. Anyone's thoughts on the matter are most welcome!

dalecosp

I can't tell you, necessarily, what "standard practice" is. I'll tell you what we did.

I installed SphinxSearch on the server. SphinxSearch has a few different programs inside. We run a "SphinxQL" server on a high port on the localhost (127.0.0.1), and Sphinx's "indexer" runs every 24 hours against the full product DB (MySQL), and then every 20 minutes on a "delta" dataset (that is, only looking for products with IDs that are newer than the one from last night). My search class(es) interact with SphinxQL (using 'match' instead of "LIKE"). One of the chiefest advantages to this was that Sphinx was better than me at ordering results by relevancy (the other real advantage was relative speed of deployment).

My $.02. 🙂