删除特定的重复记录sql server 2008

时间:2016-12-12 14:14:13

标签: sql-server

我的表格中有重复记录,下面是3个场景:

record      Adddate
22344222    2016-04-22 00:00:00.000
22344222    2016-05-06 00:00:00.000
22344222    2016-06-06 00:00:00.000
22344222    2016-06-20 00:00:00.000
22344222    2016-07-25 00:00:00.000
22344222    2016-09-26 00:00:00.000
22344222    2016-10-03 00:00:00.000
22344222    2016-10-26 00:00:00.000
22344222    2016-10-27 00:00:00.000
22344222    2016-10-28 00:00:00.000

22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000
22344223    2016-04-22 00:00:00.000

22344224    2016-04-22 00:00:00.000
22344224    2016-04-23 00:00:00.000
22344224    2016-04-24 00:00:00.000
22344224    2016-04-25 00:00:00.000
22344224    2016-04-26 00:00:00.000
22344224    2016-06-10 00:00:00.000

我想删除除2行之外的所有重复记录,其中第1行应该是具有最少添加日期的行和第2行,其中adddate之间的日期差异为45天。

在上述三种情况中,我应该只能保留以下数据

record      Adddate
22344222    2016-04-22 00:00:00.000
22344222    2016-05-06 00:00:00.000

22344223    2016-04-22 00:00:00.000

22344224    2016-04-22 00:00:00.000
22344224    2016-06-06 00:00:00.000

2 个答案:

答案 0 :(得分:5)

试试这个:

With mad(record, minDat) as 
   (Select record, min(addDate)
    From myTable
    group by record)
Delete t
from mytable t join mad m 
   on m.record = t.Record
where t.adddate not in 
   (m.minDat, dateadd(day, 45, m.minDat))

问题是您在记录22344223的源数据中有13条记录是完全相同的。

如果您只想要这13份重复的一份副本,那么在删除记录后,

create table dbo.temp (record integer, addDate date)
Insert dbo.temp(record, addDate)
Select distinct record, addDate
from mytable
-- ------------------------
Drop table myTable
-- ------------------------
exec sp_Rename 'dbo.temp', 'dbo.mytable'   

答案 1 :(得分:0)

我使用的是Test名称,因为没有提供。

WITH cte AS
(SELECT   *, NumSeq = ROW_NUMBER() OVER (PARTITION BY record ORDER BY Adddate) FROM   dbo.Test)
DELETE  FROM cte
WHERE cte.NumSeq > 1
    AND NOT EXISTS ( SELECT 1
                       FROM cte AS A
                       INNER JOIN cte AS B
                                       ON B.NumSeq > 1
                                          AND   DATEDIFF( DAY, A.Adddate, B.Adddate ) = 45
                                          AND   A.record = B.record
                       WHERE A.NumSeq = 1
                             AND cte.record = B.record
                             AND cte.Adddate = B.Adddate
                   );

SELECT  * FROM  dbo.Test;

返回4行:

22344222 2016-04-22 00:00:00.000

22344222 2016-06-06 00:00:00.000

22344223 2016-04-22 00:00:00.000

22344224 2016-04-22 00:00:00.000

编辑: 请注意,所需结果显示5/6为22344222的第2行,但这不是4/22之后的45天。我的结果返回6/6的行。另外,如果我将22344224的6/6日期添加到源数据,我的结果将返回5行。