Question

我有一张这样的表：

id     first_name     last_name     address     city_state_zip
-------------------------------------------------------------------
1      Bob            Smith         123 Place   Georgetown, TN  38119
2      Bob            Smith         123 Place   Georgetown, TN  38119
3      Bobby          Smith         123 Place   Georgetown, TN  38119

我想要一个查询，这将允许我检索所有行，例如，具有与名字相同的前3个字符和姓氏的前3个字符以及所有地址字符和所有城市字符/国家的邮编。这是我的查询，但是当我运行它时，我返回零行：

SELECT
  P1.id,
  P1.first_name,
  P1.last_name,
  P1.address,
  P1.city_state_zip
FROM person P1
JOIN (SELECT
    id,
    first_name,
    last_name
  FROM person
  GROUP BY id, 
    first_name,
    last_name,
    address,
    city_state_zip
  HAVING (count(left(first_name, 3)) > 1
    AND count(left(last_name, 3)) > 1
    AND count(address + city_state_zip) > 1)) P2 ON P2.id = P1.id

Answer 1

您不需要子查询 - 您只需要将表加入自身，并指定您的ON子句。

类似的东西：

select *
from person p1
JOIN person p2
ON
    p1.ID != p2.ID  -- because you don't want the line to match to itself
    AND left(p1.first_name,3) = left(p2.first_name,3)
    AND left(p1.last_name,3) = left(p2.last_name,3)
    AND ... etc, etc

...哦，确保你在所有/大多数列上都有索引，否则在大型表格上会非常慢。

Answer 2

我的意思是将表格加入到它的聚合版本中。聚合版本只有重复版本＆＃39;手边的信息。

SELECT *
FROM person AS P1
INNER JOIN (
    --#region
    SELECT
        first_name = SUBSTRING(first_name, 1, 3)
      , last_name = SUBSTRING(last_name, 1, 3)
      , address
      , city_state_zip
    FROM person
    GROUP BY
        SUBSTRING(first_name, 1, 3)
      , SUBSTRING(last_name, 1, 3)
      , address
      , city_state_zip
    HAVING COUNT(*) > 1
    --#endregion
) AS P2
    ON P2.first_name = SUBSTRING(P1.first_name, 1, 3)
   AND P2.last_name = SUBSTRING(P1.last_name, 1, 3)
   AND P2.address = P1.address
   AND P2.city_state_zip = P1.city_state_zip

如果它表现不佳，请尝试单独运行聚合并将其存储在@table变量或临时#table中，然后在其上运行连接。

Answer 3

这样做：

SELECT * FROM person p1 INNER JOIN 
(SELECT SUBSTRING(first_name, 1, 3) first_name, SUBSTRING(last_name, 1, 3)last_name, [address], city_state_zip
FROM person 
GROUP BY SUBSTRING(first_name, 1, 3) , SUBSTRING(last_name, 1, 3), [address], city_state_zip)
p2 ON p1.SUBSTRING(first_name, 1, 3) = p2.first_name
    AND p1.SUBSTRING(last_name, 1, 3) = p2.last_name
    AND p1.[address] = p2.[address]
    AND p1.city_state_zip = p2.city_state_zip

获取多个不同列中存在重复值的所有行

3 个答案: