MySQL:根据联接的表重复项合并数据透视表中的项

时间:2019-09-27 10:33:42

标签: mysql

我有2张桌子,参与者之一:

+----+------------+-----------+
| id | First Name | Last Name |
+----+------------+-----------+
|  0 | John       | Snow      |
|  1 | John       | Snow      |
|  2 | Michael    | Jackson   |
+----+------------+-----------+

和一个将参与者与事件联系起来的数据透视表:

+----+----------------+----------+
| id | participant_id | event_id |
+----+----------------+----------+
|  0 |              0 |       12 |
|  1 |              1 |       35 |
|  2 |              2 |       35 |
+----+----------------+----------+

错误地,参与者的表中有重复的条目。

如何删除参与者表中的重复条目并相应地更新数据透视表?因此预期结果将是:

参与者:

+----+------------+-----------+
| id | First Name | Last Name |
+----+------------+-----------+
|  0 | John       | Snow      |
|    |            |           | //deleted
|  2 | Michael    | Jackson   |
+----+------------+-----------+

数据透视表:

+----+----------------+----------+
| id | participant_id | event_id |
+----+----------------+----------+
|  0 |              0 |       12 |
|  1 |              0 |       35 | //participant_id changed from 1 to 0
|  2 |              2 |       35 |
+----+----------------+----------+

1 个答案:

答案 0 :(得分:0)

这将是一个多步骤过程:

  • 第一步是更新映射表pivot。以下查询将为您提供所有重复的名称,以及它们的第一个id
SELECT first_name, last_name, MIN(id) AS first_id 
FROM participants 
GROUP BY first_name, last_name 
HAVING COUNT(*) > 1 -- more than one rows means duplicates exist

您可以将上述查询用作子查询,以使用一系列联接来更新pivot表:

UPDATE pivot AS m 
JOIN participants AS p1 
  ON p1.id = m.participant_id 
JOIN (
       SELECT first_name, last_name, MIN(id) AS first_id 
       FROM participants 
       GROUP BY first_name, last_name 
       HAVING COUNT(*) > 1
     ) AS p2 ON p2.first_name = p1.first_name 
                AND p2.last_name = p1.last_name 
                AND p2.first_id <> p1.id  -- avoid the original row
SET m.participant_id = p2.first_id  -- update the duplicate row's id to first id
  • 现在,您可以使用相同的子查询DELETE DELETE p1 FROM participants AS p1 JOIN ( SELECT first_name, last_name, MIN(id) AS first_id FROM participants GROUP BY first_name, last_name HAVING COUNT(*) > 1 ) AS p2 ON p2.first_name = p1.first_name AND p2.last_name = p1.last_name AND p2.first_id <> p1.id -- avoid the original row (查找重复项):
UNIQUE
  • 最后,通过在(first_name, last_name)上定义ALTER TABLE participants ADD CONSTRAINT unq_idx_name UNIQUE(first_name, last_name); 约束,在数据定义级别解决此问题,以避免再次发生此问题
{{1}}