选择具有重复数据的行

时间:2012-10-28 07:07:58

标签: mysql sql select join

我有一些MySQL表需要删除冗余数据。例如:

 id email            date       data...
 1  email1@gmail.com 2012-01-01 my_data
 2  email2@gmail.com 2012-01-01 my_data
 3  email1@gmail.com 2012-01-02 my_data
 4  email1@gmail.com 2012-01-02 my_data   (redundant)
 5  email2@gmail.com 2012-01-02 my_data

我需要删除冗余行,但我想先选择它们。我在StackOverflow上找到了这个,但它需要电子邮件地址

SELECT * 
FROM `my_table`
WHERE `id` IN (SELECT `id` 
               FROM `my_table` 
               where `email` = 'email1@gmail.com' 
               group by `date` 
               HAVING count(*) > 1)

我可以使用哪种查询,如果不在嵌入式查询中使用WHERE限定符,那么我可以使用它来覆盖所有电子邮件地址吗?

查询可以是SELECT查询。我不介意在PHPMyAdmin中手动删除行。

2 个答案:

答案 0 :(得分:7)

DELETE FROM tableName
WHERE ID NOT IN
(
    SELECT minID
    FROM
    (
        SELECT email, date, MIN(id) minID
        FROM tableNAme
        GROUP BY email, date
    ) x
)

或使用JOIN

DELETE a 
FROM tableName a
    LEFT JOIN (
            SELECT minID
            FROM (
                    SELECT email, DATE, MIN(id) minID
                    FROM tableNAme
                    GROUP BY email, DATE
                    ) y
            ) x
            ON a.ID = x.minID
WHERE x.minID IS NULL;

以下查询仅SELECT每个电子邮件日期的重复行

SELECT a.*
FROM tableName a
        LEFT JOIN 
       ( 
         SELECT minID
        FROM
        (
          SELECT email, date, MIN(id) minID
          FROM tableNAme
          GROUP BY email, date
        )y
       ) x ON a.ID = x.minID
WHERE x.minID IS NULL

答案 1 :(得分:0)

另一种方法是计算表格中每个电子邮件地址的日期列的出现次数:

SELECT `email`, `date`, COUNT(*) FROM `my_table` GROUP BY `date`, `email` HAVING COUNT(*) > 1

+------------------+---------------------+----------+
| email            | date                | COUNT(*) |
+------------------+---------------------+----------+
| email1@gmail.com | 2012-01-02 00:00:00 |        2 |
+------------------+---------------------+----------+