I have a set of data which I basically need to "clean"
A user inputs their favorite Fragrance into a form, and on the form I have autocomplete to aid in choice and hopefully cut down on spelling mistakes which is vitally important to the task in hand.
For example, the following is the data I have for Paco Rabanne 1 Million products - however, I want them to choose Paco Rabanne 1 Million but I don't want them to have a choice of various lotions, soaps, Deodorant etc etc it's the brand and choice of fragrance that i'm interested in.
An example of the data I have is shown below.
Paco Rabanne 1 Million Gift Set
Paco Rabanne 1 Million Deodorant Stick 75ml
Paco Rabanne 1 Million Deodorant
Paco Rabanne 1 Million Eau de Toilette
Paco Rabanne 1 Million Gift Set 50ml
Paco Rabanne 1 Million Shower Gel 150ml
Paco Rabanne 1 Million Eau De Toilette Spray 100ml
Paco Rabanne 1 Million After Shave Lotion 100ml
I have a query which takes out certain words within the item title and shows what's left
The query:
select DISTINCT( TRIM( LEFT(title,
IF( LOCATE('Deodorant', title), LOCATE('Deodorant', title) - 1,
IF( LOCATE('Shower', title), LOCATE('Shower', title) - 1,
IF( LOCATE('Refillable', title), LOCATE('Refillable', title) - 1,
IF( LOCATE('Spray', title), LOCATE('Spray', title) - 1,
IF( LOCATE('ml', title), LOCATE('ml', title) - 1,
999
)
)
)
))))) FROM PRprod_FRAGRANCES
This basically selects from the table whilst omitting any duplicate line with Deodorant in the title, or Shower etc etc but I want to permenantly delete the records instead of running this query every time which is expensive as the table has over 200,000 rows
Essentially, I want to turn the query around so it doesn't not show the lines I don't want, but, actually deletes them instead.
Unless there is another way to attack this issue?
This is the table definition:
field, type, NULL, Key, Default, Extra
autoc_id, int(11), NO, PRI, , auto_increment
title, varchar(128), YES, , ,
genre, varchar(128), YES, , ,
author, varchar(128), YES, , ,
actors, varchar(256), YES, , ,
artist, varchar(128), YES, , ,
main_category, varchar(128), NO, , ,
dateadded, timestamp, NO, , CURRENT_TIMESTAMP,
Many thanks
Darren