MySQL删除重复记录但保持最新

时间:2011-05-24 07:37:55

标签: mysql duplicates

我有唯一的idemail字段。电子邮件得到重复。我只想保留所有重复项的一个电子邮件地址,但使用最新的id(最后插入的记录)。

我怎样才能做到这一点?

9 个答案:

答案 0 :(得分:75)

想象一下,您的表test包含以下数据:

  select id, email
    from test;

ID                     EMAIL                
---------------------- -------------------- 
1                      aaa                  
2                      bbb                  
3                      ccc                  
4                      bbb                  
5                      ddd                  
6                      eee                  
7                      aaa                  
8                      aaa                  
9                      eee 

因此,我们需要查找所有重复的电子邮件并删除所有这些电子邮件,但最新的ID 在这种情况下,重复aaabbbeee,因此我们要删除ID 1,7,2和6。

要做到这一点,首先我们需要找到所有重复的电子邮件:

      select email 
        from test
       group by email
      having count(*) > 1;

EMAIL                
-------------------- 
aaa                  
bbb                  
eee  

然后,从这个数据集中,我们需要找到每个重复电子邮件的最新ID:

  select max(id) as lastId, email
    from test
   where email in (
              select email 
                from test
               group by email
              having count(*) > 1
       )
   group by email;

LASTID                 EMAIL                
---------------------- -------------------- 
8                      aaa                  
4                      bbb                  
9                      eee                                 

最后,我们现在可以删除ID小于LASTID的所有这些电子邮件。所以解决方案是:

delete test
  from test
 inner join (
  select max(id) as lastId, email
    from test
   where email in (
              select email 
                from test
               group by email
              having count(*) > 1
       )
   group by email
) duplic on duplic.email = test.email
 where test.id < duplic.lastId;

我现在没有在这台机器上安装mySql,但应该可以正常工作

更新

以上删除有效,但我找到了更优化的版本:

 delete test
   from test
  inner join (
     select max(id) as lastId, email
       from test
      group by email
     having count(*) > 1) duplic on duplic.email = test.email
  where test.id < duplic.lastId;

您可以看到它删除了最旧的重复项,即1,7,2,6:

select * from test;
+----+-------+
| id | email |
+----+-------+
|  3 | ccc   |
|  4 | bbb   |
|  5 | ddd   |
|  8 | aaa   |
|  9 | eee   |
+----+-------+

另一个版本是由Rene Limon

提供的删除
delete from test
 where id not in (
    select max(id)
      from test
     group by email)

答案 1 :(得分:9)

正确的方法是

DELETE FROM `tablename` 
  WHERE id NOT IN (
    SELECT * FROM (
      SELECT MAX(id) FROM tablename 
        GROUP BY name
    ) 
  )

答案 2 :(得分:4)

尝试此方法

DELETE t1 FROM test t1, test t2 
WHERE t1.id > t2.id AND t1.email = t2.email

答案 3 :(得分:3)

DELETE 
FROM
  `tbl_job_title` 
WHERE id NOT IN 
  (SELECT 
    * 
  FROM
    (SELECT 
      MAX(id) 
    FROM
      `tbl_job_title` 
    GROUP BY NAME) tbl)

修订和工作版本!!!谢谢@Gaurav

答案 4 :(得分:1)

如果您要保留ID值最低的行:

DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id > n2.id AND n1.email = n2.email

如果要保留具有最高id值的行:

DELETE n1 FROM 'yourTableName' n1, 'yourTableName' n2 WHERE n1.id < n2.id AND n1.email = n2.email

答案 5 :(得分:0)

我必须说优化版本是一个甜美,优雅的代码,即使在DATETIME列上执行比较,它也像魅力一样。这是我在我的脚本中使用的,我在那里搜索每个EmployeeID的最新合同结束日期:

declare @idoc int;
exec sp_xml_preparedocument @idoc out, @x;

select *
from openxml(@idoc, '')

exec sp_xml_removedocument @idoc;

非常感谢!

答案 6 :(得分:0)

我个人对前两个投票结果有疑问。这不是最干净的解决方案,但是您可以利用临时表来避免MySQL通过在同一表上进行连接删除而带来的所有问题。

"@editorjs/editorjs": "^2.17.0",

答案 7 :(得分:0)

<%= show_svg('icons/icon-menu.svg') %>

我创建的不错的存储过程用于删除表的所有重复记录,而无需该表上现有的唯一ID。

DELIMITER // 
CREATE FUNCTION findColumnNames(tableName VARCHAR(255))
RETURNS TEXT
BEGIN
    SET @colNames = "";
     SELECT GROUP_CONCAT(COLUMN_NAME) FROM INFORMATION_SCHEMA.columns
        WHERE TABLE_NAME = tableName
        GROUP BY TABLE_NAME INTO @colNames;
    RETURN @colNames;
END // 
DELIMITER ;

DELIMITER // 
CREATE PROCEDURE deleteDuplicateRecords (IN tableName VARCHAR(255))
BEGIN
    SET @colNames = findColumnNames(tableName);
    SET @addIDStmt = CONCAT("ALTER TABLE ",tableName," ADD COLUMN id INT AUTO_INCREMENT KEY;");
    SET @deleteDupsStmt = CONCAT("DELETE FROM ",tableName," WHERE id NOT IN 
        ( SELECT * FROM ",
            " (SELECT min(id) FROM ",tableName," group by ",findColumnNames(tableName),") AS tmpTable);");
    set @dropIDStmt = CONCAT("ALTER TABLE ",tableName," DROP COLUMN id");

    PREPARE addIDStmt FROM @addIDStmt;
    EXECUTE addIDStmt;

    PREPARE deleteDupsStmt FROM @deleteDupsStmt;
    EXECUTE deleteDupsStmt;

    PREPARE dropIDStmt FROM @dropIDStmt;
    EXECUTE dropIDstmt;

END // 
DELIMITER ;

答案 8 :(得分:0)

我想根据表中的多列删除重复记录,所以这种方法对我有用,

第 1 步 - 从重复记录中获取最大 id 或唯一 id

select *  FROM ( SELECT MAX(id) FROM table_name 
group by travel_intimation_id,approved_by,approval_type,approval_status having 
count(*) > 1

第 2 步 - 从表中获取单个记录的 id

select *  FROM ( SELECT id FROM table_name 
group by travel_intimation_id,approved_by,approval_type,approval_status having 
count(*) = 1

第 3 步 - 从删除到排除以上 2 个查询

DELETE FROM `table_name` 
WHERE 
id NOT IN (paste step 1 query) a //to exclude duplicate records
and 
id NOT IN (paste step 2 query) b // to exclude single records

最终查询:-

DELETE FROM `table_name` 

WHERE id NOT IN (

select *  FROM ( SELECT MAX(id) FROM table_name 
group by travel_intimation_id,approved_by,approval_type,approval_status having 
count(*) > 1) a 
)
and id not in (

select *  FROM ( SELECT id FROM table_name 
group by travel_intimation_id,approved_by,approval_type,approval_status having 
count(*) = 1) b
);

这个查询只会删除重复的记录。