I have the unfortunate need to deal with a database that could contain l33t-speak subject fields. I realize I could pre-process all my data before it hits the database, but I want to exhaust other possibilities first. Basically, I take a user supplied search string, break it out into letters, and replace it with something in the format "(t|+)(e|3)(s|5|$)(t|+)" (thats "test" in possible l33t-speak for those that don't know), then I throw that at the database using the REGEXP extension. A query such as this takes about 10 seconds on my system, and just gets worse the more letters that are added to it.
I'll note that a single letter in this regexp format is very fast, so I'm thinking either subqueries or temp tables might help, but thought I would ask first. Anyone have some ideas? I currently have about 74K records with subjects of variables lengths.