在一个字段中查找具有重复的行之间的差异

时间:2011-02-21 20:47:36

标签: mysql sql-delete

我即将使用

从我的数据库中删除重复项
delete from table 
  where id not in (
    select min(id) 
      from table 
      group by foreign_key);

但是,我想在以下条件下这样做:

  • 如果任何重复行的值为fieldAfieldB
    • 如果任何重复项的值为fieldAfieldB,但每个字段中只有一个唯一值,请保留该值
    • 如果每行中有多个唯一值,请将此信息与idforeign_key一起报告,以便可以手动修复该值。

默认情况下,fieldAfieldBNULL,但在某些情况下,已在这些字段中输入了数据。

以下是一些示例数据:

| id | foreign_key | fieldA | fieldB |
|----+-------------+--------+--------|
|  1 |           1 | NULL   | NULL   |
|  2 |           1 | A1     | B1     |
|  3 |           1 | NULL   | NULL   |
|  4 |           2 | A2     | B2     |
|  5 |           2 | A3     | B2     |
|  6 |           3 | NULL   | NULL   |
|  7 |           4 | A4     | B4     |
|  8 |           5 | A5     | NULL   |
|  9 |           5 | NULL   | B5     |
| 10 |           6 | A6     | B6     |
| 11 |           6 | A7     | B6     |
| 12 |           7 | NULL   | B7     |
| 13 |           7 | NULL   | B7     |

我想保留的是:

| id | foreign_key | fieldA | fieldB |
|----+-------------+--------+--------|
|  2 |           1 | A1     | B1     |
|  4 |           2 | NULL   | B2     |
|  6 |           3 | NULL   | NULL   |
|  7 |           4 | A4     | B4     |
|  8 |           5 | A5     | B5     |
| 10 |           6 | NULL   | B6     |
| 12 |           7 | NULL   | B7     |

我希望退回此信息:

foreign_key 2 has two distinct values of fieldA: A2 and A3

1 个答案:

答案 0 :(得分:1)

我现在必须运行,但这是一个开头的查询:

SELECT id, foreign_key, 
    group_concat(DISTINCT fieldA) as A, count(DISTINCT fieldA) as `#A`,
    group_concat(DISTINCT fieldB) as B, count(DISTINCT fieldB) as `#B`
  FROM t1
  GROUP BY foreign_key
;

在测试数据上,返回:

| id | foreign_key | A     | #A | B    | #B |
+----+-------------+-------+----+------+----+
|  1 |           1 | A1    |  1 | B1   |  1 |
|  4 |           2 | A2,A3 |  2 | B2   |  1 |
|  6 |           3 | NULL  |  0 | NULL |  0 |
|  7 |           4 | A4    |  1 | B4   |  1 |
|  8 |           5 | A5    |  1 | B5   |  1 |
| 10 |           6 | A6,A7 |  2 | B6   |  1 |
| 12 |           7 | NULL  |  0 | B7   |  1 |

查询要保留的行:

SELECT id, foreign_key, 
    group_concat(DISTINCT fieldA) as A, count(DISTINCT fieldA) as `#A`, 
    group_concat(DISTINCT fieldB) as B, count(DISTINCT fieldB) as `#B`
  FROM t1
  GROUP BY foreign_key
  HAVING `#A` < 2 AND `#B` < 2
;

查询需要操作员干预的行:

SELECT id, foreign_key, 
    group_concat(DISTINCT fieldA) as A, count(DISTINCT fieldA) as `#A`, 
    group_concat(DISTINCT fieldB) as B, count(DISTINCT fieldB) as `#B`
  FROM t1
  GROUP BY foreign_key
  HAVING `#A` >= 2 OR `#B` >= 2
;

GROUP_CONCAT可能不合适,具体取决于列中存储的数据格式。但是,与#A和#B结合使用时,您可以检测到它何时不合适,因此它不应该是一个大问题。它可能也会对性能产生太大影响,但我想不出可以以相同方式使用的另一个聚合函数(GROUP_COALESCE会很好)。