循环遍历sql结果集并删除[n]重复

时间:2015-11-20 17:06:12

标签: sql sql-server database tsql

我有一个SQL Server数据库,里面有很多欺骗。手动删除dupes并不会很有趣,所以我想知道是否有任何类型的sql编程或脚本我可以做自动化。

以下是我的查询,它返回ID和重复的代码。

select a.ID, a.Code
from Table1 a
inner join (
SELECT Code
FROM Table1 GROUP BY Code HAVING COUNT(Code)>1)
x on x.Code= a.Code

我会得到这样的回报,例如:

5163    51727
5164    51727
5165    51727
5166    51728
5167    51728
5168    51728

此代码段显示每个ID /代码的三个返回(因此主要的“好”记录和两个欺骗)。然而,并非总是如此。虽然2-3似乎是常态,但最多可以有[n]种欺骗。

我只想以某种方式遍历此结果集并删除除一条记录之外的所有内容。删除的记录是任意的,因为它们中的任何一个都可以“保留”。

4 个答案:

答案 0 :(得分:3)

您可以使用row_number来驱动删除。 即

CREATE TABLE #table1
(id INT,
code int
);

WITH cte AS 
(select a.ID, a.Code, ROW_NUMBER() OVER(PARTITION by COdE ORDER BY ID) AS rn
from #Table1 a
)
DELETE x
FROM #table1 x
JOIN cte ON x.id = cte.id
WHERE cte.rn > 1

但是... 如果您要从非常大的表中进行大量删除操作,最好选择所需的行到临时表和放大表中。然后截断你的表并重新插入你需要的行。 保持事务日志不被破坏,你的CI变得脆弱,也应该更快!

答案 1 :(得分:1)

实际上非常简单:

DELETE FROM Table1
WHERE ID NOT IN
         (SELECT MAX(ID)
          FROM Table1
          GROUP BY CODE)

答案 2 :(得分:0)

自我加入解决方案,性能测试VS cte。

    create table codes(
id int IDENTITY(1,1) NOT NULL,
code int null,
 CONSTRAINT [PK_codes_id] PRIMARY KEY CLUSTERED 
(
    id ASC
))

declare @counter int, @code int
set @counter = 1
set @code = 1
while (@counter <= 1000000)
begin
    print ABS(Checksum(NewID()) % 1000)
    insert into codes(code) select ABS(Checksum(NewID()) % 1000)
    set @counter = @counter + 1
end
GO

set statistics time on;
    delete a 
    from codes a left join(
    select MIN(id) as id from codes
    group by code) b
    on a.id = b.id
    where b.id is null
set statistics time off;

--set statistics time on;
--  WITH cte AS 
--  (select a.id, a.code, ROW_NUMBER() OVER(PARTITION by code ORDER BY id) AS rn
--  from codes a
--  )
--  delete x
--  FROM codes x
--  JOIN cte ON x.id = cte.id
--  WHERE cte.rn > 1
--set statistics time off;

性能测试结果: 加入:

 SQL Server Execution Times:
   CPU time = 3198 ms,  elapsed time = 3200 ms.

(999000 row(s) affected)

使用CTE:

 SQL Server Execution Times:
   CPU time = 4197 ms,  elapsed time = 4229 ms.

(999000 row(s) affected)

答案 3 :(得分:0)

它基本上是这样做的:

WITH CTE_Dup AS
 (
 SELECT*,
 ROW_NUMBER()OVER (PARTITIONBY SalesOrderno, ItemNo ORDER BY SalesOrderno, ItemNo) 
AS ROW_NO
 from dbo.SalesOrderDetails  
)
DELETEFROM CTE_Dup WHERE ROW_NO > 1; 

注意:必须包括所有领域!!

这是另一个例子:

CREATE TABLE #Table (C1 INT,C2 VARCHAR(10))

INSERT INTO #Table VALUES (1,'SQL Server')
INSERT INTO #Table VALUES (1,'SQL Server')
INSERT INTO #Table VALUES (2,'Oracle')

SELECT * FROM #Table

;WITH Delete_Duplicate_Row_cte
     AS (SELECT ROW_NUMBER()OVER(PARTITION BY C1, C2 ORDER BY C1,C2) ROW_NUM,*
         FROM   #Table )
DELETE FROM Delete_Duplicate_Row_cte WHERE  ROW_NUM > 1

SELECT * FROM #Table