我目前在我的数据库中有一个URL重定向表,其中包含约8000行,其中约有6000行是重复的。
我想知道是否有一种方法可以根据某个列值删除这些重复项,如果匹配,我希望使用我的“old_url”列来查找重复项并且我已经使用了
SELECT old_url
,DuplicateCount = COUNT(1)
FROM tbl_ecom_url_redirect
GROUP BY old_url
HAVING COUNT(1) > 1 -- more than one value
ORDER BY COUNT(1) DESC -- sort by most duplicates
然而,我不知道我现在可以做些什么来删除它们,因为我不想丢失每一个,只是重复。它们几乎完全匹配,有时new_url不同,url_id(GUID)每次都不同
答案 0 :(得分:2)
我认为ranking functions和CTE
是最简单的方法:
WITH CTE AS
(
SELECT old_url
,Num = ROW_NUMBER()OVER(PARTITION BY old_url ORDER BY DateColumn ASC)
FROM tbl_ecom_url_redirect
)
DELETE FROM CTE WHERE Num > 1
相应地更改ORDER BY DateColumn ASC
以确定应删除哪些记录以及应保留哪条记录。在这种情况下,我删除所有较新的重复项。
答案 1 :(得分:0)
如果您的表有主键,那么这很容易:
BEGIN TRAN
CREATE TABLE #T(Id INT, OldUrl VARCHAR(MAX))
INSERT INTO #T VALUES
(1, 'foo'),
(2, 'moo'),
(3, 'foo'),
(4, 'moo'),
(5, 'foo'),
(6, 'zoo'),
(7, 'foo')
DELETE FROM #T WHERE Id NOT IN (
SELECT MIN(Id)
FROM #T
GROUP BY OldUrl
HAVING COUNT(OldUrl) = 1
UNION
SELECT MIN(Id)
FROM #T
GROUP BY OldUrl
HAVING COUNT(OldUrl) > 1)
SELECT * FROM #T
DROP TABLE #T
ROLLBACK
答案 2 :(得分:0)
这是用guid删除多条记录的示例,希望它可以帮助u =)
DECLARE @t1 TABLE
(
DupID UNIQUEIDENTIFIER,
DupRecords NVARCHAR(255)
)
INSERT INTO @t1 VALUES
(NEWID(),'A1'),
(NEWID(),'A1'),
(NEWID(),'A2'),
(NEWID(),'A1'),
(NEWID(),'A3')
现在,在@ t1
中创建了一个带有guid的重复记录;WITH CTE AS(
SELECT DupID,DupRecords, Rn = ROW_NUMBER()
OVER (PARTITION BY DupRecords ORDER BY DupRecords)
FROM @t1
)
DELETE FROM @t1 WHERE DupID IN (SELECT DupID FROM CTE WHERE RN>1)
上面的查询,从@ t1删除重复的记录,我使用Row_number()来区分每个记录
SELECT * FROM @t1