SQL Server重复保留最新的

时间:2017-01-23 18:07:03

标签: sql-server

请你帮忙用cte获取下面数据集上的重复记录并使它们处于非活动状态,最新的日期记录应该保持活动状态吗?

  1. 如果Fname,Lname和email是相同的-duplicate
  2. 如果Fname,Lname和phone相同-duplicate
  3. 如果custA和custb有相同的电话号码以及custb和custc有相同的电子邮件,则custc是cust a的复制品(a = b = c,)
  4. 如果Fname,Lname,email和phone相同-duplicate
  5. 这里输出应该只有cust 100应该是活动的 谢谢

    CustID  Fname   Lname   Phone       Email     Date      Active
    100     John      Doe   1234567890  NULL      10-Jan      1
    200     John      Doe   1234567890  a@a.com   2-Jan       1
    300     John      Doe   NULL        a@a.com   1-Jan       1
    

1 个答案:

答案 0 :(得分:1)

此查询将返回应保持活动的行:

WITH 
  t AS (
    SELECT *
    FROM (
      VALUES 
        (100, 'John', 'Doe', 1234567890, NULL,      CAST('2017-01-10' AS DATE)),
        (200, 'John', 'Doe', 1234567890, 'a@a.com', CAST('2017-01-02' AS DATE)),
        (300, 'John', 'Doe', NULL,       'a@a.com', CAST('2017-01-01' AS DATE))
    ) t(CustID, Fname, Lname, Phone, Email, Date)
  ),
  u AS (
    SELECT Fname, Lname, Phone, COALESCE(MAX(Email), Phone) AS Email, MAX(Date) AS Date
    FROM t
    GROUP BY Fname, Lname, Phone 
  ),
  v AS (
    SELECT Fname, Lname, COALESCE(MAX(Phone), Email) AS Phone, Email, MAX(Date) AS Date
    FROM u
    GROUP BY Fname, Lname, Email 
  )
SELECT *
FROM t
WHERE EXISTS (
  SELECT 1
  FROM v
  WHERE t.Fname = v.Fname
  AND t.Lname = v.Lname
  AND t.Date = v.Date
)

说明:

  • t表只是您的数据
  • u表找到每个MAX(Date)组的最新日期(Fname, Lname, Phone)。它还假设如果该组在至少一条记录(Email)上包含MAX(Email),那么Email将归属于该组。根据您的描述,只有一个Email,所以我们可以使用MAX()
  • v表执行相同操作,但按Fname, Lname, Email反向分组,查找该组中的最新日期。
  • 最后,我们仅保留t(原始数据)中的记录,该记录与最近每组记录的Fname, Lname, Date值相匹配。

更新active标志

查询几乎相同:

WITH 
  u AS (
    SELECT Fname, Lname, Phone, COALESCE(MAX(Email), Phone) AS Email, MAX(Date) AS Date
    FROM t
    GROUP BY Fname, Lname, Phone 
  ),
  v AS (
    SELECT Fname, Lname, COALESCE(MAX(Phone), Email) AS Phone, Email, MAX(Date) AS Date
    FROM u
    GROUP BY Fname, Lname, Email 
  )
UPDATE t
SET active = 0
OUTPUT INSERTED.*
WHERE NOT EXISTS (
  SELECT 1
  FROM v
  WHERE t.Fname = v.Fname
  AND t.Lname = v.Lname
  AND t.Date = v.Date
)

UPDATE语句的输出显示修改后的数据:

enter image description here