MySQL通过UPDATE / DELETE整合重复的数据记录

时间:2013-01-18 17:23:17

标签: mysql sql duplicates union

我有一张看起来像这样的表:

mysql> SELECT * FROM Colors;
╔════╦══════════╦════════╦════════╦════════╦════════╦════════╦════════╗
║ ID ║ USERNAME ║  RED   ║ GREEN  ║ YELLOW ║  BLUE  ║ ORANGE ║ PURPLE ║
╠════╬══════════╬════════╬════════╬════════╬════════╬════════╬════════╣
║  1 ║ joe      ║ 1      ║ (null) ║ 1      ║ (null) ║ (null) ║ (null) ║
║  2 ║ joe      ║ 1      ║ (null) ║ (null) ║ (null) ║ 1      ║ (null) ║
║  3 ║ bill     ║ 1      ║ 1      ║ 1      ║ (null) ║ (null) ║ 1      ║
║  4 ║ bill     ║ (null) ║ 1      ║ (null) ║ 1      ║ (null) ║ (null) ║
║  5 ║ bill     ║ (null) ║ 1      ║ (null) ║ (null) ║ (null) ║ (null) ║
║  6 ║ bob      ║ (null) ║ (null) ║ (null) ║ 1      ║ (null) ║ (null) ║
║  7 ║ bob      ║ (null) ║ (null) ║ (null) ║ (null) ║ (null) ║ 1      ║
║  8 ║ bob      ║ 1      ║ (null) ║ (null) ║ (null) ║ (null) ║ (null) ║
╚════╩══════════╩════════╩════════╩════════╩════════╩════════╩════════╝

我想运行一个UPDATEDELETE,它会找到并删除重复项并合并记录,以便我们以此结束。

mysql> SELECT * FROM Colors;
╔════╦══════════╦═════╦════════╦════════╦════════╦════════╦════════╗
║ ID ║ USERNAME ║ RED ║ GREEN  ║ YELLOW ║  BLUE  ║ ORANGE ║ PURPLE ║
╠════╬══════════╬═════╬════════╬════════╬════════╬════════╬════════╣
║  1 ║ joe      ║   1 ║ (null) ║ 1      ║ (null) ║ 1      ║ (null) ║
║  3 ║ bill     ║   1 ║ 1      ║ 1      ║ 1      ║ (null) ║ 1      ║
║  6 ║ bob      ║   1 ║ (null) ║ (null) ║ 1      ║ (null) ║ 1      ║
╚════╩══════════╩═════╩════════╩════════╩════════╩════════╩════════╝

我知道我可以使用脚本轻松完成此操作,但为了更好地学习和理解MySQL,我想学习如何使用纯SQL来完成此操作。

3 个答案:

答案 0 :(得分:3)

这只是一个预测。它不会更新表,也不会删除某些数据。

SELECT  MIN(ID) ID,
        Username,
        MAX(Red) max_Red,
        MAX(Green) max_Green,
        MAX(Yellow) max_Yellow,
        MAX(Blue) max_Blue,
        MAX(Orange) max_Orange,
        MAX(Purple) max_Purple
FROM    Colors
GROUP   BY Username

<强>更新

如果您确实要删除这些记录,则需要先删除记录才能运行UPDATE语句

UPDATE  Colors a
        INNER JOIN
        (
            SELECT  MIN(ID) min_ID,
                    Username,
                    MAX(Red) max_Red,
                    MAX(Green) max_Green ,
                    MAX(Yellow) max_Yellow,
                    MAX(Blue) max_Blue,
                    MAX(Orange) max_Orange,
                    MAX(Purple) max_Purple
            FROM    Colors
            GROUP   BY Username
        ) b ON a.ID = b.Min_ID 
SET     a.Red = b.max_Red,
        a.Green = b.max_Green,
        a.Yellow = b.max_Yellow,
        a.Blue = b.max_Blue,
        a.Orange = b.max_Orange,
        a.Purple = b.max_Purple

然后你现在可以删除记录,

DELETE  a
FROM    Colors a
        LEFT JOIN
        (
            SELECT  MIN(ID) min_ID,
                    Username
            FROM    Colors
            GROUP   BY Username
        ) b ON a.ID = b.Min_ID 
WHERE   b.Min_ID  IS NULL

答案 1 :(得分:1)

您真的需要更新基础表吗?如果不是(并且您只需要如示例中所示的结果集),则可以简单地对表进行分组:

SELECT   MIN(ID)     AS ID,
         Username    AS Username,
         MAX(Red)    AS Red,
         MAX(Green)  AS Green,
         MAX(Yellow) AS Yellow,
         MAX(Blue)   AS Blue,
         MAX(Orange) AS Orange,
         MAX(Purple) AS Purple
FROM     Colors
GROUP BY Username

sqlfiddle上查看。

答案 2 :(得分:0)

DELETE FROM Colors c1
WHERE EXISTS (SELECT 1
                FROM Colors c2
               WHERE c1.Username = c2.Username
                 AND ((c1.Red    IS NULL AND c2.Red    IS NULL) OR c1.Red    = c2.Red   )
                 AND ((c1.Green  IS NULL AND c2.Green  IS NULL) OR c1.Green  = c2.Green )
                 AND ((c1.Yellow IS NULL AND c2.Yellow IS NULL) OR c1.Yellow = c2.Yellow)
                 AND ((c1.Blue   IS NULL AND c2.Blue   IS NULL) OR c1.Blue   = c2.Blue  )
                 AND ((c1.Orange IS NULL AND c2.Orange IS NULL) OR c1.Orange = c2.Orange)
                 AND ((c1.Purple IS NULL AND c2.Purple IS NULL) OR c1.Purple = c2.Purple)
                 AND c2.ID < c1.ID
             )

空值使这有点复杂,因为NULL = NULL不正确但在SQL中未知。如果您有0和1,则可以省略颜色条件中OR之前的部分。