所以我正在尝试清理数据库表中的一些电话记录。
我已经找到了如何使用以下方法在2个字段中找到完全匹配的内容:
/* DUPLICATE first & last names */
SELECT
`First Name`,
`Last Name`,
COUNT(*) c
FROM phone.contacts
GROUP BY
`Last Name`,
`First Name`
HAVING c > 1;
哇,太棒了。
我想进一步扩展以查看多个字段,以查看3个电话字段中的1个电话号码是否重复。
所以我想查看3个字段(general mobile
,general phone
,business phone
)。
1.看到他们不是空的('') 2.查看其中任何一个数据(数字)是否出现在表格中任何位置的其他2个电话字段中。
因此推动我的有限SQL超出其限制我想出了以下内容,它似乎返回了3个空手机字段和记录的记录。还有没有重复电话号码的记录。
/* DUPLICATE general & business phone nos */
SELECT
id,
`first name`,
`last name`,
`general mobile`,
`general phone`,
`general email`,
`business phone`,
COUNT(CASE WHEN `general mobile` <> '' THEN 1 ELSE NULL END) as gen_mob,
COUNT(CASE WHEN `general phone` <> '' THEN 1 ELSE NULL END) as gen_phone,
COUNT(CASE WHEN `business phone` <> '' THEN 1 ELSE NULL END) as bus_phone
FROM phone.contacts
GROUP BY
`general mobile`,
`general phone`,
`business phone`
HAVING gen_mob > 1 OR gen_phone > 1 OR bus_phone > 1;
显然我的逻辑是有缺陷的&amp;我想知道是否有人可以指出我正确的方向/怜惜等...
非常感谢
答案 0 :(得分:5)
你要做的第一件事是拍摄那些用空格命名列的人。
现在,试试这个:
SELECT DISTINCT
c.id,
c.`first name`,
c.`last name`,
c.`general mobile`,
c.`general phone`,
c.`business phone`
from contacts_test c
join contacts_test c2
on (c.`general mobile`!= '' and c.`general mobile` in (c2.`general phone`, c2.`business phone`))
or (c.`general phone` != '' and c.`general phone` in (c2.`general mobile`, c2.`business phone`))
or (c.`business phone`!= '' and c.`business phone` in (c2.`general mobile`, c2.`general phone`))
在SQLFiddle中查看此查询的live demo。
请注意phone != ''
的额外检查,这是必需的,因为电话号码不可为空,因此其“未知”值为空。如果没有此检查,则返回错误匹配,因为当然空白等于空白。
如果有多个其他行匹配,则添加DISTINCT
关键字,这将导致nxn结果集。
答案 1 :(得分:1)
根据我的经验,在清理数据时,理解数据视图以及管理数据的简单方法要好得多,而不是要有一个庞大而庞大的查询来同时执行所有分析。
您还可以(或多或少)重新规范数据库,使用类似:
Create view VContactsWithPhones
as
Select id,
`Last Name` as LastName,
`First Name` as FirstName,
`General Mobile` as Phone,
'General Mobile' as PhoneType
From phone.contacts c
UNION
Select id,
`Last Name`,
`First Name`,
`General Phone`,
'General Phone'
From phone.contacts c
UNION
Select id,
`Last Name`,
`First Name`,
`Business Phone`,
'Business Phone'
From phone.contacts c
这将生成一个视图,其中包含原始表格的三倍,但带有Phone
列,可以是三种类型之一。
您可以轻松地从该视图中进行选择:
//empty phones
SELECT *
FROM VContactsWithPhones
Where Phone is null or Phone = ''
//duplicate phones
Select Phone, Count(*)
from VContactsWithPhones
where (Phone is not null and Phone <> '') -- exclude empty values
group by Phone
having count(*) > 1
//duplicate phones belonging to the same ID (double entries)
Select Phone, ID, Count(*)
from VContactsWithPhones
where (Phone is not null and Phone <> '') -- exclude empty values
group by Phone, ID
having count(*) > 1
//duplicate phones belonging to the different ID (duplicate entries)
Select v1.Phone, v1.ID, v1.PhoneType, v2.ID, v2.PhoneType
from VContactsWithPhones v1
inner join VContactsWithPhones v2
on v1.Phone=v2.Phone and v1.ID=v2.ID
where v1.Phone is not null and v1.Phone <> ''
等等...
答案 2 :(得分:0)
您可以尝试以下内容:
SELECT * from phone.contacts p WHERE `general mobile` IN (SELECT `general mobile` FROM phone.contacts WHERE id != p.id UNION SELECT `general phone` FROM phone.contacts WHERE id != p.id UNION SELECT `general email` FROM phone.contacts WHERE id != p.id)
每次重复3次:general mobile
,general phone
和general email
。它可以放在一个查询中,但可读性较差。