SQL:根据条件计算重复记录

时间:2015-01-23 16:47:25

标签: sql

以下是我需要帮助的查询的小部分。此部分生成一个记录计数,其中EmailAddress和DateOfBirth都是重复的。

注释掉的行应该产生一个记录计数,其中EmailAddress被复制但DateOfBirth不同。即识别共享电子邮件地址的用户(假设两个用户具有不同的出生日期)。

SELECT     
u.EmailAddress,
u.DateOfBirth,
COUNT(*) over (partition by u.EmailAddress, DateOfBirth) AS EmailAndDoBDup,

--COUNT(*) where EmailAddress is duplicate but DateOfBirth is unique (in the aggregated results)

FROM [User] AS u 

由于

2 个答案:

答案 0 :(得分:0)

您可以使用子查询执行此操作。我不认为有一种方法可以使用单个窗口函数来执行此操作:

SELECT u.EmailAddress, u.DateOfBirth,
       EmailAndDoBDup,
       SUM(CASE WHEN EmailAndDoBDup = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY EmailAddress) as YourCol
FROM (SELECT u.*,
             COUNT(*) OVER (partition by u.EmailAddress, DateOfBirth) as EmailAndDoBDup
      FROM [User] u
     ) u;

编辑:

如果您希望每个电子邮件地址和DOB有一行,则可以将其标记为聚合查询:

SELECT u.EmailAddress, u.DateOfBirth, COUNT(*) as EmailAndDoBDup,
       SUM(CASE WHEN COUNT(*) = 1 THEN 1 ELSE 0 END) OVER (PARTITION BY EmailAddress) as YourCol
FROM [User] u
GROUP BY u.EmailAddress, u.DateOfBirth;

这不需要子查询,但它可能不适合您更复杂的查询。

答案 1 :(得分:0)

而不是在SELECT部分​​中进行,我会将外连接保留为两组,如:

LEFT OUTER join
(SELECT     
    EmailAddress,
    DateOfBirth
FROM
    USER
GROUP BY
    EmailAddress,
    DateOfBirth
HAVING
    COUNT(DISTINCT ID) > 1) dupEmailDOB
...
LEFT OUTER JOIN
(SELECT     
    EmailAddress
FROM
    USER
GROUP BY
    EmailAddress
HAVING
    COUNT(DISTINCT DateOfBirth) > 1) emailMultipleDOBs

因为如果您需要添加其他条件

,它更容易维护