我有一些MySQL表需要删除冗余数据。例如:
id email date data...
1 email1@gmail.com 2012-01-01 my_data
2 email2@gmail.com 2012-01-01 my_data
3 email1@gmail.com 2012-01-02 my_data
4 email1@gmail.com 2012-01-02 my_data (redundant)
5 email2@gmail.com 2012-01-02 my_data
我需要删除冗余行,但我想先选择它们。我在StackOverflow上找到了这个,但它需要电子邮件地址
SELECT *
FROM `my_table`
WHERE `id` IN (SELECT `id`
FROM `my_table`
where `email` = 'email1@gmail.com'
group by `date`
HAVING count(*) > 1)
我可以使用哪种查询,如果不在嵌入式查询中使用WHERE限定符,那么我可以使用它来覆盖所有电子邮件地址吗?
查询可以是SELECT查询。我不介意在PHPMyAdmin中手动删除行。
答案 0 :(得分:7)
DELETE FROM tableName
WHERE ID NOT IN
(
SELECT minID
FROM
(
SELECT email, date, MIN(id) minID
FROM tableNAme
GROUP BY email, date
) x
)
或使用JOIN
DELETE a
FROM tableName a
LEFT JOIN (
SELECT minID
FROM (
SELECT email, DATE, MIN(id) minID
FROM tableNAme
GROUP BY email, DATE
) y
) x
ON a.ID = x.minID
WHERE x.minID IS NULL;
以下查询仅SELECT
每个电子邮件和日期的重复行
SELECT a.*
FROM tableName a
LEFT JOIN
(
SELECT minID
FROM
(
SELECT email, date, MIN(id) minID
FROM tableNAme
GROUP BY email, date
)y
) x ON a.ID = x.minID
WHERE x.minID IS NULL
答案 1 :(得分:0)
另一种方法是计算表格中每个电子邮件地址的日期列的出现次数:
SELECT `email`, `date`, COUNT(*) FROM `my_table` GROUP BY `date`, `email` HAVING COUNT(*) > 1
+------------------+---------------------+----------+
| email | date | COUNT(*) |
+------------------+---------------------+----------+
| email1@gmail.com | 2012-01-02 00:00:00 | 2 |
+------------------+---------------------+----------+