I have a large db that looks like this:
CREATE TABLE `cra` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`post_id` bigint(13) NOT NULL,
`subject` varchar(180) DEFAULT NULL,
`date_time` datetime DEFAULT NULL,
`street` varchar(100) DEFAULT NULL,
`city` varchar(20) DEFAULT NULL,
`company_name` varchar(150) DEFAULT NULL,
`email1` varchar(40) DEFAULT NULL,
`email2` varchar(40) DEFAULT NULL,
`email3` varchar(40) DEFAULT NULL,
`email4` varchar(40) DEFAULT NULL,
`phone1` varchar(40) DEFAULT NULL,
`phone2` varchar(40) DEFAULT NULL,
`phone3` varchar(40) DEFAULT NULL,
`phone4` varchar(40) DEFAULT NULL,
`link1` varchar(250) DEFAULT NULL,
`link2` varchar(250) DEFAULT NULL,
`link3` varchar(250) DEFAULT NULL,
`link4` varchar(250) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=420 DEFAULT CHARSET=latin1;
I'm omitting a few other fields, but this will suffice for my example.
I do have to create a report (or a subset of the db) with unique data. For example, I have to ignore records that have emails from domains like yahoo, hotmail, gmail. This shouldn't be a problem, what I'm worried about is that I have to ignore records where (email1 OR email2 OR email3 OR email4) are equal to (email1 OR email2 OR email3 OR email4) in any other record in the database. Same thing applies to phone numbers and links (where I have up to 10 links per record).
What would be a good and decently efficient way of doing this?
First thing that comes to my mind is comparing one by one, but that could take a while.
I was wondering if there's a way of doing something like this:
If I have a record
{id=1, ..., phone1=11111111, phone2=22222222, phone3=33333333, phone4='',...}
have mysql create a temporary table with something like
{id=1, ..., phone=11111111,...}
{id=1, ..., phone=22222222,...}
{id=1, ..., phone=33333333,...}
{id=1, ..., phone=,...}
and then just use DISTINCT. Don't know if it's possible, but I'm sure one of you do! 🙂
Btw, I'm not too worried about efficiency as I'm worried about how easy is to implement the solution.
Any ideas?
I will really appreciate your help!