I'm going to be cleaning up a database. I know how to use LIKE in a query, but I was wondering if there's some nice easy way to find records that have a varchar field similar to some string.

For instance, if my string is 'foobar' then this query might return records where the field called 'name' contains any one of these:
the foobar companies
foobar, the
* foo bar

    You can use LIKE :

    SELECT * FROM table_name WHERE columnname LIKE %foobar%

      I'm familiar with LIKE...I was considering using MySQL's full text search functions which use MATCH and AGAINST.

      It's a bit complicated with LIKE but I have some ideas. I was imagining that if I was trying to match against some $name, i would:

      • explode $name along whitespace
      • trim each chunk
      • drop common useless words like the/and/of and punctuation
      • construct a LIKE query from the remaining words

      thus, 'david geffen company' would break into an array much like this one

      $words = array('david', 'geffen', 'company');

      and i might make my query something like this:

      $sql = "SELECT * FROM table WHERE name LIKE '%" . implode('%', $words) . "%'";
      // SELECT * FROM table WHERE name LIKE '%david%geffen%company%'

      HOWEVER, in that approach order is important and all non-trival words must be matched. That would mean these two potential matches would not be caught:
      Corporation for Public Broadcasting
      Public Broadcasting, Inc.

      or perhaps:
      Universal Music, Inc.
      Vivendi Universal

      It seemed to me that writing all the code to construct every combination and permutation of the words in a name would be time-consuming and result in a horribly inefficient and nasty query. I was really hoping for a better way.

        That kind of fuzzy logic matching is very difficult (but you already know that).

        Maybe instead of using LIKE as the last step try a REGEXP match on each of the individual words, summing up the number of matches (REGEXP returns a 1 or 0) and selecting the row if a majority are present. Something like:

        where (name REGEXP 'david' + 
               name REGEXP 'geffen' + 
               name REGEXP 'company'
              ) >= 2
        

        http://dev.mysql.com/doc/refman/5.0/en/regexp.html

        :eek:

          Write a Reply...