Question

我需要在表格中获取重复行的ID列表，以便我可以使用where id in删除它们。这是我的表：

id|col1|col2
1 |22  | text
2 |22  | text
3 |23  | text
4 |22  | text2

所以在这里ids 1和2是重复的，其他则不是。所以我知道如何使用group by and having count(*) > 1

来获取它们

但我想保留一个并删除其他人。所以这就是清除重复后这个表的样子：

    id|col1|col2
    1 |22  | text
    3 |23  | text
    4 |22  | text2

或：

id|col1|col2
2 |22  | text
3 |23  | text
4 |22  | text2

任何一个都没问题。我怎样才能做到这一点？摆脱重复，但最后保留一个副本，使它不再重复？

我的下一个目标是为这些字段添加索引，这样就不会再发生了。

Answer 1

尝试类似：

delete from table_name
where id not in (select min(id)
                 from table_name
                 group by col1, col2);

它将删除每个id组中非最小col1, col2的所有行。

替代查询：

delete from table_name t1
where exists (select *
              from table_name t2
              where t1.col1 = t2.col2
                and t1.col2 = t2.col2
                and t1.id < t2.id );

它以同样的方式做同样的事情。

Answer 2

Igor Romanchenko给出了很好的解决方案，另一个可能是：

with cte as c (
    select id, row_number() over(partition by col1, col2 order by id) as rn
    from Table1
)
delete Table1 as t
from cte as c
where c.id = t.id and c.rn > 1

Answer 3

我认为这个可能会慢一点

DELETE FROM tab
NATURAL JOIN 
(
   SELECT DISTINCT ON(col2, col3) id AS target, col2, col3
   FROM tab
   ORDER by col2, col3 /* can add order by id if you care which is kept */
) AS subq WHERE tab.id <> subq.target;

但是我会尝试使用示例数据来查看。

过滤掉数据库中的重复项但保留原件

3 个答案: