我有一张这样的表:
id first_name last_name address city_state_zip
-------------------------------------------------------------------
1 Bob Smith 123 Place Georgetown, TN 38119
2 Bob Smith 123 Place Georgetown, TN 38119
3 Bobby Smith 123 Place Georgetown, TN 38119
我想要一个查询,这将允许我检索所有行,例如,具有与名字相同的前3个字符和姓氏的前3个字符以及所有地址字符和所有城市字符/国家的邮编。这是我的查询,但是当我运行它时,我返回零行:
SELECT
P1.id,
P1.first_name,
P1.last_name,
P1.address,
P1.city_state_zip
FROM person P1
JOIN (SELECT
id,
first_name,
last_name
FROM person
GROUP BY id,
first_name,
last_name,
address,
city_state_zip
HAVING (count(left(first_name, 3)) > 1
AND count(left(last_name, 3)) > 1
AND count(address + city_state_zip) > 1)) P2 ON P2.id = P1.id
答案 0 :(得分:1)
您不需要子查询 - 您只需要将表加入自身,并指定您的ON子句。
类似的东西:
select *
from person p1
JOIN person p2
ON
p1.ID != p2.ID -- because you don't want the line to match to itself
AND left(p1.first_name,3) = left(p2.first_name,3)
AND left(p1.last_name,3) = left(p2.last_name,3)
AND ... etc, etc
...哦,确保你在所有/大多数列上都有索引,否则在大型表格上会非常慢。
答案 1 :(得分:0)
我的意思是将表格加入到它的聚合版本中。聚合版本只有重复版本'手边的信息。
SELECT *
FROM person AS P1
INNER JOIN (
--#region
SELECT
first_name = SUBSTRING(first_name, 1, 3)
, last_name = SUBSTRING(last_name, 1, 3)
, address
, city_state_zip
FROM person
GROUP BY
SUBSTRING(first_name, 1, 3)
, SUBSTRING(last_name, 1, 3)
, address
, city_state_zip
HAVING COUNT(*) > 1
--#endregion
) AS P2
ON P2.first_name = SUBSTRING(P1.first_name, 1, 3)
AND P2.last_name = SUBSTRING(P1.last_name, 1, 3)
AND P2.address = P1.address
AND P2.city_state_zip = P1.city_state_zip
如果它表现不佳,请尝试单独运行聚合并将其存储在@table变量或临时#table中,然后在其上运行连接。
答案 2 :(得分:0)
这样做:
SELECT * FROM person p1 INNER JOIN
(SELECT SUBSTRING(first_name, 1, 3) first_name, SUBSTRING(last_name, 1, 3)last_name, [address], city_state_zip
FROM person
GROUP BY SUBSTRING(first_name, 1, 3) , SUBSTRING(last_name, 1, 3), [address], city_state_zip)
p2 ON p1.SUBSTRING(first_name, 1, 3) = p2.first_name
AND p1.SUBSTRING(last_name, 1, 3) = p2.last_name
AND p1.[address] = p2.[address]
AND p1.city_state_zip = p2.city_state_zip