删除重复项,但各保留一行

时间:2018-08-22 11:12:25

标签: sql sql-server

我有下表:

ONBackup表:

Contract    FromDate    Invoice     Data
232         12/12/2017  123
232         14/02/2018  123
232         15/07/2018  123
232         14/02/2017  676
311         12/12/2017  881

“重复”的行很多,对我来说重复的是发票号相同,即其他字段可以不同。

该表有140万行(大约有100万行重复),所以不确定以下内容是否可以工作,因为我无聊地等待3个小时并开始计数,它肯定比我要占用更多CPU。< / p>

DELETE FROM ONBackup
WHERE Invoice NOT IN
(
    SELECT MIN(Invoice)
    FROM ONBackup
    GROUP BY Invoice
)

有没有一种更快的方法可以起作用?

3 个答案:

答案 0 :(得分:5)

使用row_number()函数:

delete b
from (select b.*, row_number() over (partition by b.invoice order by b.fromdate desc) as seq 
      from ONBackup b
     ) b
where seq > 1;

这将为每个fromdate留下最新的invoice

答案 1 :(得分:4)

在这里,我认为CTE是一个不错的选择:(请注意,您必须在前面的语句中使用分号结束)。

WITH CTE AS 

(
  SELECT Invoice, ROW_NUMBER() OVER (PARTITION BY INVOICE ORDER BY SELECT '1') AS RowNumb
  FROM ONBackup
)

DELETE FROM CTE WHERE RowNumb > 1

答案 2 :(得分:3)

  DELETE A
FROM
(
  select  *,row_number() over (partition by invoice order by invoice)as rn from 
  table1
) A
WHERE A.rn > 1