我有一个SQL Server数据库,里面有很多欺骗。手动删除dupes并不会很有趣,所以我想知道是否有任何类型的sql编程或脚本我可以做自动化。
以下是我的查询,它返回ID和重复的代码。
select a.ID, a.Code
from Table1 a
inner join (
SELECT Code
FROM Table1 GROUP BY Code HAVING COUNT(Code)>1)
x on x.Code= a.Code
我会得到这样的回报,例如:
5163 51727
5164 51727
5165 51727
5166 51728
5167 51728
5168 51728
此代码段显示每个ID /代码的三个返回(因此主要的“好”记录和两个欺骗)。然而,并非总是如此。虽然2-3似乎是常态,但最多可以有[n]种欺骗。
我只想以某种方式遍历此结果集并删除除一条记录之外的所有内容。删除的记录是任意的,因为它们中的任何一个都可以“保留”。
答案 0 :(得分:3)
您可以使用row_number来驱动删除。 即
CREATE TABLE #table1
(id INT,
code int
);
WITH cte AS
(select a.ID, a.Code, ROW_NUMBER() OVER(PARTITION by COdE ORDER BY ID) AS rn
from #Table1 a
)
DELETE x
FROM #table1 x
JOIN cte ON x.id = cte.id
WHERE cte.rn > 1
但是... 如果您要从非常大的表中进行大量删除操作,最好选择所需的行到临时表和放大表中。然后截断你的表并重新插入你需要的行。 保持事务日志不被破坏,你的CI变得脆弱,也应该更快!
答案 1 :(得分:1)
实际上非常简单:
DELETE FROM Table1
WHERE ID NOT IN
(SELECT MAX(ID)
FROM Table1
GROUP BY CODE)
答案 2 :(得分:0)
自我加入解决方案,性能测试VS cte。
create table codes(
id int IDENTITY(1,1) NOT NULL,
code int null,
CONSTRAINT [PK_codes_id] PRIMARY KEY CLUSTERED
(
id ASC
))
declare @counter int, @code int
set @counter = 1
set @code = 1
while (@counter <= 1000000)
begin
print ABS(Checksum(NewID()) % 1000)
insert into codes(code) select ABS(Checksum(NewID()) % 1000)
set @counter = @counter + 1
end
GO
set statistics time on;
delete a
from codes a left join(
select MIN(id) as id from codes
group by code) b
on a.id = b.id
where b.id is null
set statistics time off;
--set statistics time on;
-- WITH cte AS
-- (select a.id, a.code, ROW_NUMBER() OVER(PARTITION by code ORDER BY id) AS rn
-- from codes a
-- )
-- delete x
-- FROM codes x
-- JOIN cte ON x.id = cte.id
-- WHERE cte.rn > 1
--set statistics time off;
性能测试结果: 加入:
SQL Server Execution Times:
CPU time = 3198 ms, elapsed time = 3200 ms.
(999000 row(s) affected)
使用CTE:
SQL Server Execution Times:
CPU time = 4197 ms, elapsed time = 4229 ms.
(999000 row(s) affected)
答案 3 :(得分:0)
它基本上是这样做的:
WITH CTE_Dup AS
(
SELECT*,
ROW_NUMBER()OVER (PARTITIONBY SalesOrderno, ItemNo ORDER BY SalesOrderno, ItemNo)
AS ROW_NO
from dbo.SalesOrderDetails
)
DELETEFROM CTE_Dup WHERE ROW_NO > 1;
注意:必须包括所有领域!!
这是另一个例子:
CREATE TABLE #Table (C1 INT,C2 VARCHAR(10))
INSERT INTO #Table VALUES (1,'SQL Server')
INSERT INTO #Table VALUES (1,'SQL Server')
INSERT INTO #Table VALUES (2,'Oracle')
SELECT * FROM #Table
;WITH Delete_Duplicate_Row_cte
AS (SELECT ROW_NUMBER()OVER(PARTITION BY C1, C2 ORDER BY C1,C2) ROW_NUM,*
FROM #Table )
DELETE FROM Delete_Duplicate_Row_cte WHERE ROW_NUM > 1
SELECT * FROM #Table