我有一个包含大量数据的表。在那个表中有一行,没有唯一的ID,所以可以有它们的二重奏 - 我通过这个查询找到了它们:
SELECT theid FROM thetable
GROUP BY theid
HAVING COUNT(*) > 1
在表格中还有像street1,street2,city1,city2
这样的列在我找到dublets的第一个查询的行列表中,在那些我需要检查street1是否与street2不同,city1与city2不同,在第一个查询中给定id的任何dublets中都有意义?
所以假设我们有两行具有相同的ID - 在那些我需要检查street1是否与具有特定id的所有行中的street1不同
有关如何执行此操作的任何提示,指示,我正在盲目主演此问题,并且似乎无法找到正确的查询。
非常感谢
答案 0 :(得分:0)
使用CTE将有助于:
;WITH CTE AS
(
SELECT theID,
Street1,
Street2,
Street3,
City,
State,
Zip,
rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
FROM Table
-- add joins if necessary
)
SELECT oldestID = c1.theID,
oldestStreet1 = c1.Street1,
newestStreet1 = c2.Street1,
newestID = c2.theID
FROM CTE c1
INNER JOIN CTE c2 ON c2.rn = c1.rn + 1
您还可以添加一个case语句来显示匹配与不匹配。这将有助于手动识别拼写错误(1337 Test St. vs 1337 Test Street):
;WITH CTE AS
(
SELECT theID,
Street1,
Street2,
Street3,
City,
State,
Zip,
rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
FROM Table
-- add joins if necessary
)
SELECT oldestID = c1.theID,
oldestStreet1 = CASE WHEN c1.Street1 = c2.Street1 THEN 'Match' ELSE c1.Street1 END,
newestStreet1 = CASE WHEN c1.Street1 = c2.Street1 THEn 'Match' ELSE c2.Street1 END,
newestID = c2.theID
FROM CTE c1
INNER JOIN CTE c2 ON c2.rn = c1.rn + 1
或者您可以通过将其添加到INNER JOIN
子句中来返回不匹配的项目:
;WITH CTE AS
(
SELECT theID,
Street1,
Street2,
Street3,
City,
State,
Zip,
rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
FROM Table
-- add joins if necessary
)
SELECT oldestID = c1.theID,
oldestStreet1 = c1.Street1,
newestStreet1 = c2.Street1,
newestID = c2.theID
FROM CTE c1
INNER JOIN CTE c2 ON c2.rn = c1.rn + 1 AND c1.Street1 <> c2.Street1 -- add as many of these as you need.
请注意,这些是完全匹配。您可以实现静态模糊逻辑LEFT(Zip, 5)
以仅匹配邮政编码的前5位数字(如果某些邮政编码有一个拉链+4而某些邮件不是。)
答案 1 :(得分:0)
你也可以这样分析,
;WITH CTE AS
(
SELECT theID,
Street1,
Street2,
Street3,
City,
State,
Zip,
rn = ROW_NUMBER() OVER(PARTITION BY theID ORDER BY theID)
FROM Table
-- add joins if necessary
)
,
CTE1 as
(
select *,ROW_NUMBER()
OVER(PARTITION BY theID,Street1,Street2,City,State,Zip
oRDER BY theID) rn2 from cte where rn>2
)
select * from cte1