如何识别和删除重复的行,最新的除外

时间:2016-03-08 01:59:21

标签: mysql

我在HeidiSQL工作,我试图找出如何删除除最新行之外的所有重复行。 "重复之间存在一些细微差别,"但只要有四个以上的特定值相同(即UserID,ContactID,SMSID和EventID),该行就被视为重复。我需要根据最近的行(由CreatedDate标识)删除它们。

以下查询标识了这些行:

SELECT a.UserID, a.ContactID, a.SMSID, a.EventID, CreatedDate
FROM WhenToText a 
JOIN (SELECT UserID, ContactID, SMSID, EventID
       FROM WhenToText 
       GROUP BY UserID, ContactID, SMSID, EventID
       HAVING COUNT(*) > 1 ) b
ON a.UserID = b.UserID
AND a.ContactID = b.ContactID
AND a.SMSID = b.SMSID
AND a.EventID = b.EventID
ORDER BY UserID, ContactID, SMSID, EventID, CreatedDate DESC

但是,在我发现这些副本之后,我不确定如何删除这些副本。

以下是一些示例数据:

enter image description here

3 个答案:

答案 0 :(得分:1)

这是一个使用DELETE FROM JOIN的解决方案,带有您的数据的完整演示。

SQL:

-- Data preparation
create table WhenToText(UserID int, ContactID int, SMSID int, EventID int, CreatedDate datetime);
insert into WhenToText values
    (4,   25,  7934, 7407, '2016-02-10 00:00:11'),
    (4,   25,  7934, 7407, '2016-02-09 00:00:12'),
    (4,   29,  5132, 7407, '2016-02-10 00:00:11'),
    (4,   29,  5132, 7407, '2016-02-09 00:00:12'),
    (4,   31, 12944, 7405, '2016-02-10 07:03:02'),
    (4,   31, 12944, 7405, '2016-02-10 05:03:02'),
    (4,  146, 12908, 7405, '2016-02-10 06:52:02'),
    (4,  146, 12908, 7405, '2016-02-10 04:52:02'),
    (15,  63, 12964, 7401, '2016-02-10 03:42:04'),
    (15,  63, 12964, 7401, '2016-02-10 03:41:04'),
    (15,  64, 12326, 7401, '2016-02-07 03:01:03'),
    (15,  64, 12326, 7401, '2016-02-07 03:00:03');
SELECT * FROM WhenToText;

-- SQL needed
DELETE a FROM 
    WhenToText a INNER JOIN
    (
     SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) CreatedDate
     FROM WhenToText
     GROUP BY UserID, ContactID, SMSID, EventID
     ) b
    USING(UserID, ContactID, SMSID, EventID)
WHERE 
    a.CreatedDate != b.CreatedDate;

SELECT * FROM WhenToText;

输出:

mysql> SELECT * FROM WhenToText;
+--------+-----------+-------+---------+---------------------+
| UserID | ContactID | SMSID | EventID | CreatedDate         |
+--------+-----------+-------+---------+---------------------+
|      4 |        25 |  7934 |    7407 | 2016-02-10 00:00:11 |
|      4 |        25 |  7934 |    7407 | 2016-02-09 00:00:12 |
|      4 |        29 |  5132 |    7407 | 2016-02-10 00:00:11 |
|      4 |        29 |  5132 |    7407 | 2016-02-09 00:00:12 |
|      4 |        31 | 12944 |    7405 | 2016-02-10 07:03:02 |
|      4 |        31 | 12944 |    7405 | 2016-02-10 05:03:02 |
|      4 |       146 | 12908 |    7405 | 2016-02-10 06:52:02 |
|      4 |       146 | 12908 |    7405 | 2016-02-10 04:52:02 |
|     15 |        63 | 12964 |    7401 | 2016-02-10 03:42:04 |
|     15 |        63 | 12964 |    7401 | 2016-02-10 03:41:04 |
|     15 |        64 | 12326 |    7401 | 2016-02-07 03:01:03 |
|     15 |        64 | 12326 |    7401 | 2016-02-07 03:00:03 |
+--------+-----------+-------+---------+---------------------+
12 rows in set (0.00 sec)

mysql>
mysql> -- SQL needed
mysql> DELETE a FROM
    ->     WhenToText a INNER JOIN
    ->     (
    ->      SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) CreatedDate
    ->      FROM WhenToText
    ->      GROUP BY UserID, ContactID, SMSID, EventID
    ->      ) b
    ->     USING(UserID, ContactID, SMSID, EventID)
    -> WHERE
    ->     a.CreatedDate != b.CreatedDate;

SELECT * FQuery OK, 6 rows affected (0.00 sec)

mysql>
mysql> SELECT * FROM WhenToText;
+--------+-----------+-------+---------+---------------------+
| UserID | ContactID | SMSID | EventID | CreatedDate         |
+--------+-----------+-------+---------+---------------------+
|      4 |        25 |  7934 |    7407 | 2016-02-10 00:00:11 |
|      4 |        29 |  5132 |    7407 | 2016-02-10 00:00:11 |
|      4 |        31 | 12944 |    7405 | 2016-02-10 07:03:02 |
|      4 |       146 | 12908 |    7405 | 2016-02-10 06:52:02 |
|     15 |        63 | 12964 |    7401 | 2016-02-10 03:42:04 |
|     15 |        64 | 12326 |    7401 | 2016-02-07 03:01:03 |
+--------+-----------+-------+---------+---------------------+
6 rows in set (0.00 sec)

答案 1 :(得分:1)

这是一种方法:

DELETE FROM WhenToText w1
INNER JOIN
(
    SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS MaxDate
    FROM WhenToText
    GROUP BY UserID, ContactID, SMSID, EventID
) w2
    ON w1.UserID = w2.UserID AND w1.ContactID = w2.ContactID AND w1.SMSID = w2.SMSID
        AND w1.EventID = w2.EventID
        AND w1.CreatedDate != w2.MaxDate

这将删除(UserID, ContactID, SMSID, EventID)不是最新的CreatedDate组的任何记录。请注意,如果共享最新的CreatedDate,这可能会为每个组留下多条记录。

如果您想先测试哪个查询以查看哪些记录将被删除,您可以将DELETE FROM WhenToText w1替换为SELECT w1.* FROM WhenToText w1

这是一个指向SQL Fiddle的链接,它演示了查询如何识别要删除的记录:

SQLFiddle

答案 2 :(得分:0)

如果CreatedDate是日期数据类型,这应该提供您正在寻找的解决方案。这也是假设最近一行在技术上是最近的CreatedDate。

SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS CreatedDate
FROM WhenToText 
GROUP BY 1, 2, 3, 4;

使用这些值,您可以覆盖WhenToText表...这看起来像这样......

CREATE TABLE tmp_table LIKE WhenToText;

INSERT INTO tmp_table (SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS CreatedDate
                          FROM WhenToText 
                          GROUP BY 1, 2, 3, 4);

TRUNCATE WhenToText;

INSERT INTO WhenToText (SELECT * FROM tmp_table);

DROP TABLE tmp_table;