找到重复项并删除但保留一个值

时间:2013-08-21 06:52:08

标签: sql sql-server

我目前在我的数据库中有一个URL重定向表,其中包含约8000行,其中约有6000行是重复的。

我想知道是否有一种方法可以根据某个列值删除这些重复项,如果匹配,我希望使用我的“old_url”列来查找重复项并且我已经使用了

SELECT old_url
    ,DuplicateCount = COUNT(1)
FROM tbl_ecom_url_redirect
GROUP BY old_url
HAVING COUNT(1) > 1  -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates
然而,我不知道我现在可以做些什么来删除它们,因为我不想丢失每一个,只是重复。它们几乎完全匹配,有时new_url不同,url_id(GUID)每次都不同

3 个答案:

答案 0 :(得分:2)

我认为ranking functionsCTE是最简单的方法:

WITH CTE AS
(
    SELECT old_url
          ,Num = ROW_NUMBER()OVER(PARTITION BY old_url ORDER BY DateColumn ASC)
    FROM tbl_ecom_url_redirect
)
DELETE FROM CTE WHERE Num > 1

相应地更改ORDER BY DateColumn ASC以确定应删除哪些记录以及应保留哪条记录。在这种情况下,我删除所有较新的重复项。

答案 1 :(得分:0)

如果您的表有主键,那么这很容易:

BEGIN TRAN
CREATE TABLE #T(Id INT, OldUrl VARCHAR(MAX))

INSERT INTO #T VALUES 
    (1, 'foo'),
    (2, 'moo'),
    (3, 'foo'),
    (4, 'moo'),
    (5, 'foo'),
    (6, 'zoo'),
    (7, 'foo')

DELETE FROM #T WHERE Id NOT IN (
    SELECT MIN(Id) 
    FROM #T 
    GROUP BY OldUrl
    HAVING COUNT(OldUrl) = 1
    UNION 
    SELECT MIN(Id) 
    FROM #T 
    GROUP BY OldUrl
    HAVING COUNT(OldUrl) > 1)

SELECT * FROM #T

DROP TABLE #T

ROLLBACK

答案 2 :(得分:0)

这是用guid删除多条记录的示例,希望它可以帮助u =)

DECLARE @t1 TABLE
(
DupID UNIQUEIDENTIFIER,
DupRecords NVARCHAR(255)
)

INSERT INTO @t1 VALUES 
(NEWID(),'A1'),
(NEWID(),'A1'),
(NEWID(),'A2'),
(NEWID(),'A1'),
(NEWID(),'A3')

现在,在@ t1

中创建了一个带有guid的重复记录
;WITH CTE AS(
SELECT DupID,DupRecords, Rn = ROW_NUMBER()
OVER (PARTITION BY DupRecords ORDER BY DupRecords)
FROM @t1 
)
DELETE FROM @t1 WHERE DupID IN (SELECT DupID FROM CTE WHERE RN>1)

上面的查询,从@ t1删除重复的记录,我使用Row_number()来区分每个记录

SELECT * FROM @t1