从两列中删除重复项

时间:2015-03-29 21:18:12

标签: mysql sql

我有一个包含以下架构的表:

+---------------+-------------+------+-----+---------+----------------+
| Field         | Type        | Null | Key | Default | Extra          |
+---------------+-------------+------+-----+---------+----------------+
| id            | int(11)     | NO   | PRI | NULL    | auto_increment |
| system_one_id | int(11)     | NO   | MUL | NULL    |                |
| system_two_id | int(11)     | NO   | MUL | NULL    |                |
| type          | smallint(6) | NO   |     | NULL    |                |
+---------------+-------------+------+-----+---------+----------------+

我想删除重复项,其中"重复"被定义为:

  1. 在两行之间匹配system_one_idsystem_two_id的值,或
  2. "交叉匹配"值,即row1.system_one_id = row2.system_two_idrow1.system_two_id = row2.system_one_id
  3. 有没有办法在一个查询中删除这两种重复项?

4 个答案:

答案 0 :(得分:2)

Mysql支持多表删除,因此可以使用简单的连接:

delete t1
from mytable t1
join mytable t2 on t1.id > t2.id
  and ((t1.system_one_id = t2.system_one_id
    and t1.system_two_id = t2.system_two_id)
    or (t1.system_one_id = t2.system_two_id
    and t1.system_two_id = t2.system_one_id))

连接条件t1.id > t2.id会阻止加入自身的行选择重复对的以后添加的行作为删除的行。


仅供参考,在postgres中,存在类似的功能,但语法不同:

delete mytable t1
using mytable t2
where t1.id > t2.id
  and ((t1.system_one_id = t2.system_one_id
    and t1.system_two_id = t2.system_two_id)
    or (t1.system_one_id = t2.system_two_id
    and t1.system_two_id = t2.system_one_id))

答案 1 :(得分:1)

这是一个声明(希望)选择所有重复记录的ID,你只需要用删除命令(这是你的部分)包装它。 ; - )

select A.ID from MYTABLE A
left join MYTABLE B on 
(
    (A.SYSTEM_ONE_ID = B.SYSTEM_ONE_ID and A.SYSTEM_TWO_ID = B.SYSTEM_TWO_ID) 
    or 
    (A.SYSTEM_ONE_ID = B.SYSTEM_TWO_ID AND A.SYSTEM_TWO_ID = B.SYSTEM_ONE_ID)
)
where B.ID is not null and A.ID <> B.ID;

答案 2 :(得分:0)

您可以按leastgreatest进行分组,以选择每个组的最小ID,并删除包含其他ID的行。

delete from mytable
where id not in (
    select * from (
        select min(id)
        from mytable 
        group by greatest(system_one_id, system_two_id),
        least(system_one_id, system_two_id)
    ) t1
)

答案 3 :(得分:0)

此查询从min id开始,然后仅选择not selected records in previous selection with regard to system_idst.id > t2.id

delete from your_table t
where id not in (select id from 
                (select distinct t.id
                from your_table t
                where 
                (
                      select count(*)
                      from your_table t2
                      where t.id > t2.id
                            and ((t.system_one_id=t2.system_one_id
                                 and t.system_two_id=t2.system_two_id)
                                 or (t.system_one_id=t2.system_two_id
                                 and t.system_two_id=t2.system_one_id))
                ) =0
              ) tbl
            )