sql optimization using LIKE

olaf · Feb 5, 2006

Hello,

I have a question about performance for a query in mysql database (version 4.1.16)

I have a table with addresses holding more than 1 million records.

this is my structure:
CREATE TABLE bedrijven (
id bigint(20) unsigned NOT NULL auto_increment,
bedrijf varchar(50) collate latin1_bin NOT NULL default '',
adres varchar(50) collate latin1_bin NOT NULL default '',
postcode varchar(7) collate latin1_bin NOT NULL default '',
plaats varchar(50) collate latin1_bin NOT NULL default '',
provincie char(2) collate latin1_bin NOT NULL default '',
telefoon varchar(25) collate latin1_bin NOT NULL default '',
telefax varchar(25) collate latin1_bin NOT NULL default '',
PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_bin COMMENT='tabel met bedrijven data' AUTO_INCREMENT=1298947 ;

The following query takes more then 4 seconds to get the result:

SELECT *
FROM bedrijven
WHERE LOWER( bedrijf ) LIKE '%nederland bv%'
AND provincie = 'NH'

Is there a way of optimization?

Wynder · Feb 5, 2006

The LOWER is probably affecting perfomance -- I'm trying to remember correctly... You could put an index on bedrijf, but if you start a LIKE with a % it doesn't use the index.

So, with that query, I don't think there's much you can do.

Roger_Ramjet · Feb 6, 2006

I don't see why you are using the binary collation for address columns !??!
Whatever your reasons, if you are then going to search using lower() you should have forced the data to lower-case when you inserted it, not when you query for it.

Best bet to speed this up would be with Full Text Search . This would give you an indexed search that can be done across multiple columns if you want, rather than the table-scan you get with like.

olaf · Feb 7, 2006

Hello,

Thank you both for the hints.

I never used the fulltext index before :-(
This feature sounds good but it's at the moment too late because the application is ready. Maybe later if there is the time to rebuild the databasemodel.

But of course I learned this time.

About the "binary collation": This is where I'm not out yet since I'm using mysql 4.1. It sounds to me the right collation for characters like ë, ä, ñ etc. The default collation on my db server is "latin1_swedish_ci", is this better?

A last question about the lower function: I can't use them lowercase in my table because the company names are used also for data presentation. But if I can skip the lower function, does this boost the query speed?

And what if I use a different table type? (f.e. InnoD make this sense?

Thanks again

olaf · Feb 7, 2006

Hello,

I added some indexes and did some table optimization (removing whitespace, city names are already in uppercase...)
this is the table now:

CREATE TABLE bedrijven (
id bigint(20) unsigned NOT NULL auto_increment,
bedrijf varchar(50) collate latin1_general_ci NOT NULL default '',
adres varchar(50) collate latin1_general_ci NOT NULL default '',
postcode varchar(7) collate latin1_general_ci NOT NULL default '',
plaats varchar(50) collate latin1_general_ci NOT NULL default '',
provincie char(2) collate latin1_general_ci NOT NULL default '',
telefoon varchar(25) collate latin1_general_ci NOT NULL default '',
telefax varchar(25) collate latin1_general_ci NOT NULL default '',
PRIMARY KEY (id),
KEY plaats (plaats),
KEY provincie (provincie)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci COMMENT='tabel met bedrijven data' AUTO_INCREMENT=1298947 ;

Roger_Ramjet · Feb 8, 2006

Well, you are using myisam tables so full text search is available. Now, you say you've added indexes etc, so you could add the fts index.
Any query that begins with a wildcard '%' search forces a table scan and ignores indexes anyway, so tweaking indexes is not going to affect this search one bit.
The only changes fts would entail would be rewritting the query - everything else would stay as is. You would now get an indexed search that will be faster by orders of magnitude.
Whole of the rest of the application would be unaffected by this change.

olaf · Feb 8, 2006

Roger Ramjet wrote:
Well, you are using myisam tables so full text search is available. Now, you say you've added indexes etc, so you could add the fts index.
Any query that begins with a wildcard '%' search forces a table scan and ignores indexes anyway, so tweaking indexes is not going to affect this search one bit.
The only changes fts would entail would be rewritting the query - everything else would stay as is. You would now get an indexed search that will be faster by orders of magnitude.
Whole of the rest of the application would be unaffected by this change.

Thanks for your advice, the optimizations I did are enough yet (I think there where bigger problems than the table scan).

The execution time went from 4-5 seconds beneath one second, its fast enough for the customer.

sql optimization using LIKE

Oolaf

WWynder

RRoger_Ramjet

Oolaf

Oolaf

RRoger_Ramjet

Oolaf