我在HeidiSQL工作,我试图找出如何删除除最新行之外的所有重复行。 "重复之间存在一些细微差别,"但只要有四个以上的特定值相同(即UserID,ContactID,SMSID和EventID),该行就被视为重复。我需要根据最近的行(由CreatedDate标识)删除它们。
以下查询标识了这些行:
SELECT a.UserID, a.ContactID, a.SMSID, a.EventID, CreatedDate
FROM WhenToText a
JOIN (SELECT UserID, ContactID, SMSID, EventID
FROM WhenToText
GROUP BY UserID, ContactID, SMSID, EventID
HAVING COUNT(*) > 1 ) b
ON a.UserID = b.UserID
AND a.ContactID = b.ContactID
AND a.SMSID = b.SMSID
AND a.EventID = b.EventID
ORDER BY UserID, ContactID, SMSID, EventID, CreatedDate DESC
但是,在我发现这些副本之后,我不确定如何删除这些副本。
以下是一些示例数据:
答案 0 :(得分:1)
这是一个使用DELETE FROM JOIN的解决方案,带有您的数据的完整演示。
SQL:
-- Data preparation
create table WhenToText(UserID int, ContactID int, SMSID int, EventID int, CreatedDate datetime);
insert into WhenToText values
(4, 25, 7934, 7407, '2016-02-10 00:00:11'),
(4, 25, 7934, 7407, '2016-02-09 00:00:12'),
(4, 29, 5132, 7407, '2016-02-10 00:00:11'),
(4, 29, 5132, 7407, '2016-02-09 00:00:12'),
(4, 31, 12944, 7405, '2016-02-10 07:03:02'),
(4, 31, 12944, 7405, '2016-02-10 05:03:02'),
(4, 146, 12908, 7405, '2016-02-10 06:52:02'),
(4, 146, 12908, 7405, '2016-02-10 04:52:02'),
(15, 63, 12964, 7401, '2016-02-10 03:42:04'),
(15, 63, 12964, 7401, '2016-02-10 03:41:04'),
(15, 64, 12326, 7401, '2016-02-07 03:01:03'),
(15, 64, 12326, 7401, '2016-02-07 03:00:03');
SELECT * FROM WhenToText;
-- SQL needed
DELETE a FROM
WhenToText a INNER JOIN
(
SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) CreatedDate
FROM WhenToText
GROUP BY UserID, ContactID, SMSID, EventID
) b
USING(UserID, ContactID, SMSID, EventID)
WHERE
a.CreatedDate != b.CreatedDate;
SELECT * FROM WhenToText;
输出:
mysql> SELECT * FROM WhenToText;
+--------+-----------+-------+---------+---------------------+
| UserID | ContactID | SMSID | EventID | CreatedDate |
+--------+-----------+-------+---------+---------------------+
| 4 | 25 | 7934 | 7407 | 2016-02-10 00:00:11 |
| 4 | 25 | 7934 | 7407 | 2016-02-09 00:00:12 |
| 4 | 29 | 5132 | 7407 | 2016-02-10 00:00:11 |
| 4 | 29 | 5132 | 7407 | 2016-02-09 00:00:12 |
| 4 | 31 | 12944 | 7405 | 2016-02-10 07:03:02 |
| 4 | 31 | 12944 | 7405 | 2016-02-10 05:03:02 |
| 4 | 146 | 12908 | 7405 | 2016-02-10 06:52:02 |
| 4 | 146 | 12908 | 7405 | 2016-02-10 04:52:02 |
| 15 | 63 | 12964 | 7401 | 2016-02-10 03:42:04 |
| 15 | 63 | 12964 | 7401 | 2016-02-10 03:41:04 |
| 15 | 64 | 12326 | 7401 | 2016-02-07 03:01:03 |
| 15 | 64 | 12326 | 7401 | 2016-02-07 03:00:03 |
+--------+-----------+-------+---------+---------------------+
12 rows in set (0.00 sec)
mysql>
mysql> -- SQL needed
mysql> DELETE a FROM
-> WhenToText a INNER JOIN
-> (
-> SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) CreatedDate
-> FROM WhenToText
-> GROUP BY UserID, ContactID, SMSID, EventID
-> ) b
-> USING(UserID, ContactID, SMSID, EventID)
-> WHERE
-> a.CreatedDate != b.CreatedDate;
SELECT * FQuery OK, 6 rows affected (0.00 sec)
mysql>
mysql> SELECT * FROM WhenToText;
+--------+-----------+-------+---------+---------------------+
| UserID | ContactID | SMSID | EventID | CreatedDate |
+--------+-----------+-------+---------+---------------------+
| 4 | 25 | 7934 | 7407 | 2016-02-10 00:00:11 |
| 4 | 29 | 5132 | 7407 | 2016-02-10 00:00:11 |
| 4 | 31 | 12944 | 7405 | 2016-02-10 07:03:02 |
| 4 | 146 | 12908 | 7405 | 2016-02-10 06:52:02 |
| 15 | 63 | 12964 | 7401 | 2016-02-10 03:42:04 |
| 15 | 64 | 12326 | 7401 | 2016-02-07 03:01:03 |
+--------+-----------+-------+---------+---------------------+
6 rows in set (0.00 sec)
答案 1 :(得分:1)
这是一种方法:
DELETE FROM WhenToText w1
INNER JOIN
(
SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS MaxDate
FROM WhenToText
GROUP BY UserID, ContactID, SMSID, EventID
) w2
ON w1.UserID = w2.UserID AND w1.ContactID = w2.ContactID AND w1.SMSID = w2.SMSID
AND w1.EventID = w2.EventID
AND w1.CreatedDate != w2.MaxDate
这将删除(UserID, ContactID, SMSID, EventID)
不是最新的CreatedDate
组的任何记录。请注意,如果共享最新的CreatedDate
,这可能会为每个组留下多条记录。
如果您想先测试哪个查询以查看哪些记录将被删除,您可以将DELETE FROM WhenToText w1
替换为SELECT w1.* FROM WhenToText w1
。
这是一个指向SQL Fiddle的链接,它演示了查询如何识别要删除的记录:
答案 2 :(得分:0)
如果CreatedDate是日期数据类型,这应该提供您正在寻找的解决方案。这也是假设最近一行在技术上是最近的CreatedDate。
SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS CreatedDate
FROM WhenToText
GROUP BY 1, 2, 3, 4;
使用这些值,您可以覆盖WhenToText表...这看起来像这样......
CREATE TABLE tmp_table LIKE WhenToText;
INSERT INTO tmp_table (SELECT UserID, ContactID, SMSID, EventID, MAX(CreatedDate) AS CreatedDate
FROM WhenToText
GROUP BY 1, 2, 3, 4);
TRUNCATE WhenToText;
INSERT INTO WhenToText (SELECT * FROM tmp_table);
DROP TABLE tmp_table;