我有一个包含以下架构的表:
+---------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| system_one_id | int(11) | NO | MUL | NULL | |
| system_two_id | int(11) | NO | MUL | NULL | |
| type | smallint(6) | NO | | NULL | |
+---------------+-------------+------+-----+---------+----------------+
我想删除重复项,其中"重复"被定义为:
system_one_id
和system_two_id
的值,或row1.system_one_id = row2.system_two_id
和row1.system_two_id = row2.system_one_id
有没有办法在一个查询中删除这两种重复项?
答案 0 :(得分:2)
Mysql支持多表删除,因此可以使用简单的连接:
delete t1
from mytable t1
join mytable t2 on t1.id > t2.id
and ((t1.system_one_id = t2.system_one_id
and t1.system_two_id = t2.system_two_id)
or (t1.system_one_id = t2.system_two_id
and t1.system_two_id = t2.system_one_id))
连接条件t1.id > t2.id
会阻止加入自身的行和选择重复对的以后添加的行作为删除的行。
仅供参考,在postgres中,存在类似的功能,但语法不同:
delete mytable t1
using mytable t2
where t1.id > t2.id
and ((t1.system_one_id = t2.system_one_id
and t1.system_two_id = t2.system_two_id)
or (t1.system_one_id = t2.system_two_id
and t1.system_two_id = t2.system_one_id))
答案 1 :(得分:1)
这是一个声明(希望)选择所有重复记录的ID,你只需要用删除命令(这是你的部分)包装它。 ; - )
select A.ID from MYTABLE A
left join MYTABLE B on
(
(A.SYSTEM_ONE_ID = B.SYSTEM_ONE_ID and A.SYSTEM_TWO_ID = B.SYSTEM_TWO_ID)
or
(A.SYSTEM_ONE_ID = B.SYSTEM_TWO_ID AND A.SYSTEM_TWO_ID = B.SYSTEM_ONE_ID)
)
where B.ID is not null and A.ID <> B.ID;
答案 2 :(得分:0)
您可以按least
和greatest
进行分组,以选择每个组的最小ID,并删除包含其他ID的行。
delete from mytable
where id not in (
select * from (
select min(id)
from mytable
group by greatest(system_one_id, system_two_id),
least(system_one_id, system_two_id)
) t1
)
答案 3 :(得分:0)
此查询从min id开始,然后仅选择not selected records in previous selection with regard to system_ids
(t.id > t2.id
)
delete from your_table t
where id not in (select id from
(select distinct t.id
from your_table t
where
(
select count(*)
from your_table t2
where t.id > t2.id
and ((t.system_one_id=t2.system_one_id
and t.system_two_id=t2.system_two_id)
or (t.system_one_id=t2.system_two_id
and t.system_two_id=t2.system_one_id))
) =0
) tbl
)