I know this is php but I didn't find a place on the mysql forum for help of this sort and you guys have been great in the past so I'm hoping I'm not overstepping here. Thanks in advance for the help.

First some background I'm trying to locate duplicates of customers in a table cust, but its more difficult than that. There are multiple 'shops' that use the file and one customer can be used by more than one 'shop' or not. There is a second table bonus that attaches the customers to the shops and holds some other info that is specific to that shop and customer. The two tables are related with the customer id. cust.id and bonus.custid the id is a hash of the first three letters of the last name + first four letters of the first name + a number if there is already one These are unique id's but sometimes a person will sign up with two emails or a new address and won't acknowledge that it found a previous version or whatever. At any rate sometimes I end up with two records for the same individual. So the idea I'm trying to do with this select is to pull all of the customers that have more than one record in the cust (first 7 characters the same) but also that are attached to the shop that the query is coming from (shop that is logged in) so I don't share customers with other shops that aren't actually shared. So here is the query made from shop 1

SELECT `cust`.* ,`bonus`.`password` from `cust`,`bonus` 
WHERE `cust`.`id`=`bonus`.`custid` and `bonus`.`shop`='1' and substr(`cust`.`id`,1,7)
IN (SELECT substr(`bonus`.`custid`,1,7) FROM `bonus` GROUP BY substr(`bonus`.`custid`,1,7) 
HAVING COUNT(*)>1) 
ORDER BY `bonus`.`custid`

I thought I was limiting it to shop one in the original where statement, but when the same customer is attached to more than one shop it pulls them up even though there is only one entry in the cust file. What am I doing wrong?

    My first concern is: how do you know that the two different records really do refer to the same individual? Note that even full names are not unique, and it looks like you're only comparing "a hash of the first three letters of the last name + first four letters of the first name" (and it looks more like concatenation than a hash?), so name collisions are even more likely.

    laserlight

    I don't that is for the shop owner to determine. I guess I should say potential duplicates. They could be the same person with different emails or different people with the same name, or the same person that moved. Yes Concantenation I suppose is more accurate. Actually relatively few collisions actually have occurred in over 8-10 years of using it. But I have previously had to manually look for the duplicates and it is harder.

    So I just want to pull the ones that have the same first seven on the id and are attached to this shop if there are more than one in the cust table. Then they are presented to the shop owner to determine if they are the same and they can combine them or edit them. But right now its pulling them if there is only one in cust but are in more than one shop.

      Write a Reply...