我在测试数据库中有一个表,当运行INSERT脚本来设置它时,有人显然有点过于兴奋。架构如下所示:
ID UNIQUEIDENTIFIER
TYPE_INT SMALLINT
SYSTEM_VALUE SMALLINT
NAME VARCHAR
MAPPED_VALUE VARCHAR
它应该有几十行。它有大约200,000个,其中大部分都是重复的,其中TYPE_INT,SYSTEM_VALUE,NAME和MAPPED_VALUE都是相同的而ID不是。
现在,我可以创建一个脚本来清理它,在内存中创建一个临时表,使用INSERT .. SELECT DISTINCT
获取所有唯一值,TRUNCATE
原始表,然后将所有内容复制回来。但有没有更简单的方法,比如在DELETE
子句中有一些特殊的WHERE
查询?
答案 0 :(得分:4)
你没有提供你的表名,但我认为这样的事情应该有效。只是留下恰好具有最低ID的记录。您可能希望首先使用ROLLBACK进行测试!
BEGIN TRAN
DELETE <table_name>
FROM <table_name> T1
WHERE EXISTS(
SELECT * FROM <table_name> T2
WHERE
T1.TYPE_INT = T2.TYPE_INT AND
T1.SYSTEM_VALUE = T2.SYSTEM_VALUE AND
T1.NAME = T2.NAME AND
T1.MAPPED_VALUE = T2.MAPPED_VALUE AND
T2.ID > T1.ID
)
SELECT * FROM <table_name>
ROLLBACK
答案 1 :(得分:3)
这是一篇很棒的文章:Deleting duplicates,基本上使用这种模式:
WITH q AS
(
SELECT d.*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY value) AS rn
FROM t_duplicate d
)
DELETE
FROM q
WHERE rn > 1
SELECT *
FROM t_duplicate
答案 2 :(得分:2)
WITH Duplicates(ID , TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE )
AS
(
SELECT Min(Id) ID TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE
FROM T1
GROUP BY TYPE_INT, SYSTEM_VALUE, NAME, MAPPED_VALUE
HAVING Count(Id) > 1
)
DELETE FROM T1
WHERE ID IN (
SELECT T1.Id
FROM T1
INNER JOIN Duplicates
ON T1.TYPE_INT = Duplicates.TYPE_INT
AND T1.SYSTEM_VALUE = Duplicates.SYSTEM_VALUE
AND T1.NAME = Duplicates.NAME
AND T1.MAPPED_VALUE = Duplicates.MAPPED_VALUE
AND T1.Id <> Duplicates.ID
)