我试图创建一个sql查询,它将检测(可能)我的数据库中的重复客户:
我有两张桌子:
例如,如果我有
所以我的sql查询应该搜索具有相同firstname,lastname和zip的客户以及cid = 1的客户与cid = 2的客户相同的检测。
但是,应该可以说,客户cid = 1和cid = 2不一样,方法是通过设置cid1 = 1和cid2 = 2在IgnoreForDuplicateCustomer表中存储一个新条目。
因此,使用此sql查询脚本检测重复的客户是否正常工作:
SELECT cid, firstname, lastname, zip, COUNT(*) AS NumOccurrences
FROM Customer
GROUP BY fistname, lastname,zip
HAVING ( COUNT(*) > 1 )
我的问题是,我无法将IgnoreForDuplicateCustomer表集成到那里 与前面的示例一样,cid = 1和cid = 2的客户不会被标记/查询为相同,因为IgnoreForDuplicateCustomer表中有一个条目/规则。
所以我尝试通过添加where子句来扩展我之前的查询:
SELECT cid, firstname, lastname, COUNT(*) AS NumOccurrences
FROM Customer
WHERE cid NOT IN (
SELECT cid1 FROM IgnoreForDuplicateCustomer WHERE cid2=cid
UNION
SELECT cid2 FROM IgnoreForDuplicateCustomer WHERE cid1=cid
)
GROUP BY firstname, lastname, zip
HAVING ( COUNT(*) > 1 )
不幸的是,这个额外的WHERE子句对我的结果完全没有影响。 有什么建议吗?
答案 0 :(得分:1)
根据TPete的评论进行编辑(不要尝试):
SELECT
C1.cid, C1.firstname, C1.lastname
FROM
Customer C1,
Customer C2
WHERE
C1.cid < C2.cid AND
C1.firstname = C2.firstname AND
C1.lastname = C2.lastname AND
C1.zip = C2.zip AND
CAST(C1.cid AS VARCHAR)+' ' +CAST(C2.cid AS VARCHAR) <>
(SELECT CAST(cid1 AS VARCHAR)+' '+CAST(cid2 AS VARCHAR) FROM IgnoreForDuplicateCustomer I WHERE I.cid1 = C1.cid AND I.cid2 = C2.cid);
最初我认为IgnoreForDuplicateCustomer
是客户表中的一个字段。
答案 1 :(得分:1)
你在这里:
Select a.*
From (
select c1.cid 'CID1', c2.cid 'CID2'
from Customer c1
join Customer c2 on c1.firstname=c2.firstname
and c1.lastname=c2.lastname and c1.zip=c2.zip
and c1.cid < c2.cid) a
Left Join (
Select cid1 'CID1', cid2 'CID2'
From ignoreforduplicatecustomer one
Union
Select cid2 'CID1', cid1 'CID2'
From ignoreforduplicatecustomer two) b on a.cid1 = b.cid1 and a.cid2 = b.cid2
where b.cid1 is null
这将从customer
表中获取重复记录的ID,这些记录不在表ignoreforduplicatecustomer
中。
经过测试:
CREATE TABLE IF NOT EXISTS `customer` (
`CID` int(11) NOT NULL AUTO_INCREMENT,
`Firstname` varchar(50) NOT NULL,
`Lastname` varchar(50) NOT NULL,
`ZIP` varchar(10) NOT NULL,
PRIMARY KEY (`CID`))
ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=100 ;
INSERT INTO `customer` (`CID`, `Firstname`, `Lastname`, `ZIP`) VALUES
(1, 'John', 'Smith', '1234'),
(2, 'John', 'Smith', '1234'),
(3, 'John', 'Smith', '1234'),
(4, 'Jane', 'Doe', '1234');
和
CREATE TABLE IF NOT EXISTS `ignoreforduplicatecustomer` (
`CID1` int(11) NOT NULL,
`CID2` int(11) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO `ignoreforduplicatecustomer` (`CID1`, `CID2`) VALUES
(1, 2);
我的测试设置的结果是:
CID1 CID2
1 3
2 3
答案 2 :(得分:0)
疯了,但我觉得它有效:)
首先,我将自己的客户表加入名称以获取重复项 然后我排除了IgnoreForDuplicateCustomer表上的键(联合是因为第一个查询返回cid1,cid2和cid2,cid1
结果将重复,但我认为您可以获得所需的信息
select c1.cid, c2.cid
from Customer c1
join Customer c2 on c1.firstname=c2.firstname
and c1.lastname=c2.lastname and c1.zip=c2.zip
and c1.cid!=c2.cid
except
(
select cid1,cid2 from IgnoreForDuplicateCustomer
UNION
select cid2,cid1 from IgnoreForDuplicateCustomer
)
第二枪:
select firstname,lastname,zip from Customer
group by firstname,lastname,zip
having (count(*)>1)
except
select c1.firstname, c1.lastname, c1.zip
from Customer c1 join IgnoreForDuplicateCustomer IG on c1.cid=ig.cid1 join Customer c2 on ig.cid2=c2.cid
第三
select firstname,lastname,zip from (
select firstname,lastname,zip from Customer
group by firstname,lastname,zip
having (count(*)>1)
) X
where firstname not in (
select c1.firstname
from Customer c1 join IgnoreForDuplicateCustomer IG on c1.cid=ig.cid1 join Customer c2 on ig.cid2=c2.cid
)