MYSQL - Removing Duplicated Rows

camarosource

How do I delete DUPLICATE ROWS from my MYSQL Database?

I have a large number of rows that have been duplicated. Obviously I only need 1 of each.

Example:

[RPO_CODES] [DESCRIPTION]

    Z28                         Performance Package
    Z28                         Performance Package
    Z28                         Performance Package
    Z28                         Performance Package
    Z28                         Performance Package
    Z28                         Performance Package
    Z28                         Performance Package

I want to do this simply with SQL Query. I was told DISTINT would work? Help. How.. Thanks.

mpb001

If this table is linked off referentially in any way then you have to be very careful as you will have to update any references before deletion.

Not sure sql is the language to use as it requires
1. Get duplicated rows
2. Select the one leave
3. Update any references from the ones that will be deleted to the one that is chosen to be left
4. Delete all but the chosen ones.

If there are no references then skip 3.
Unless you have stored procedures( even then it will be tricky ) it has to be done in code.

It is 2 and 3 that make it hard. Now deleting all duplicated rows, that is easy.

After you have cleaned it put a unique index on what you want to be unique so it cannot happen again.

rulian

what you'll probably have to do is port the table over to another temporary table, and populate that table with rows extracted by distinct calls from your original table.
You can use group by to extract row information,a nd use a subquery or variable to pass those values along, or PHP if that would be easier

Sxooter

The basic operation here is to first select all but one row out of a set of duplicates then use that as a subselect to a delete statement. Then when you're done you put a unique constraint / index on that row so you don't have to do this again.

I'll be using pgsql syntax. You'll have to translate to other dbs.

 create table mytable (info text);
 insert into mytable values ('abc');
 insert into mytable values ('abc');
 insert into mytable values ('def');
 insert into mytable values ('def');
 insert into mytable values ('def');
 insert into mytable values ('hij');
 select * from mytable;
 info
------
 abc
 abc
 def
 def
 def
 hij
(6 rows)

As you can see we have dupes in the info field. Now, we'll modify this table to give it a unique value for each row in a new column. If your table already has a unique field then you can skip this part, obviously

 alter table mytable add unid int;
 create temp sequence t;
 update mytable set unid=nextval('t');
UPDATE 6
 select * from mytable ;
 info | unid
------+------
 abc  |    1
 abc  |    2
 def  |    3
 def  |    4
 def  |    5
 hij  |    6
select distinct a.unid from mytable a join mytable b on (a.unid > b.unid and a.info=b.info);
 unid
------
    2
    4
    5
(3 rows)

Now that we have the "extra" rows listed in that select, We can delete them. I'll do this in a transaction in case things go wrong.

e=# begin;
BEGIN
 delete from mytable where unid in (select distinct a.unid from mytable a join mytable b on (a.unid > b.unid and a.info=b.info));
DELETE 3
 select * from mytable ;  info | unid
------+------
 abc  |    1
 def  |    3
 hij  |    6
(3 rows)

-- Looks good let's commit:
 commit;
COMMIT

Easy, eh?