我必须清理一个包含重复行的表:
id: serial id
gid: group id
url: string <- this is the column that I have to cleanup
一个gid
可能有多个url
值:
id gid url
---- ---- ------------
1 12 www.gmail.com
2 12 www.some.com
3 12 www.some.com <-- duplicate
4 13 www.other.com
5 13 www.milfsome.com <-- not a duplicate
我想针对整个表执行一个查询,并删除gid
和url
重复的所有行。在上面的示例中,删除后,我想只剩下1,2,4和5。
答案 0 :(得分:13)
;WITH x AS
(
SELECT id, gid, url, rn = ROW_NUMBER() OVER
(PARTITION BY gid, url ORDER BY id)
FROM dbo.table
)
SELECT id,gid,url FROM x WHERE rn = 1 -- the rows you'll keep
-- SELECT id,gid,url FROM x WHERE rn > 1 -- the rows you'll delete
-- DELETE x WHERE rn > 1; -- do the delete
如果您对第一个选择感到满意,这表示您将保留的行,请将其删除并取消注释第二个选择。一旦你对它感到满意,这表示你将删除的行,删除它并取消注释删除。
如果您不想删除数据,只需忽略SELECT
下的注释行...
答案 1 :(得分:1)
SELECT
MIN(id) AS id,
gid,
url
FROM yourTable
GROUP BY gid, url