MySQL不使用主键删除重复项

时间:2015-04-09 00:41:00

标签: mysql database csv duplicates key

我在MySQL数据库中有这个表。

1   test1.csv   Jan     Thomas      Sales       5000
2   test1.csv   Jan     Michael     Sales       200
3   test1.csv   Thomas  John        Technology  12900
4   test2.csv   Robert  James       Technology  5500
5   test2.csv   Robert  Albertson   Technology  6000
6   test2.csv   Mark    Jeffries    Technology  900
7   test2.csv   Ted     James       Technology  10000
8   test2.csv   Mayla   Arthurs     Technology  7000
9   test2.csv   Mayla   Smith       Technology  9500
10  test3.csv   Mayla   Anthony     Technology  3000
11  test3.csv   Mayla   Mark        Technology  3000
12  test4.csv   Mayla   Roberts     Technology  8500
13  test4.csv   Anthony Anderson    Marketing   9500
14  test5.csv   Anthony Smith       Technology  6000
15  test5.csv   Jan     Thomas      Sales       5000
16  test5.csv   Jan     Michael     Sales       200
17  test5.csv   Thomas  John        Technology  12900
18  test1.csv   Jan     Michael     Sales       8000
19  test1.csv   Thomas  John        Technology  1540
20  test2.csv   Mayla   Smith       Technology  10500
21  test3.csv   Mayla   Anthony     Technology  5600
22  test4.csv   Anthony Anderson    Marketing   2500
23  test5.csv   Brian   Earl        HR          1200
24  test5.csv   John    Smith       HR_Sales    2000
25  test6.csv   Jan     Thomas      HR_Sales    12000
26  test6.csv   Jan     Michael     Education   1500
27  test7.csv   Thomas  John        HR_Sales    1000

创建表的SQL代码在本文末尾。每条记录包括文件名,名字,姓氏,部门,工资。有时,多个文件中存在相同的记录 - 我不能拥有这些重复的记录。

如你所见: id = 15,16,17分别是id = 1,2,3的重复。

我需要删除文件名不同的重复项,但记录是相同的。

其他信息

  1. 我无法使用DELETE FROM employee WHERE id IN (15, 16, 17)因为 我不知道哪些行会被复制。
  2. 通过向其添加更多*.csv个文件,该表不断更新。这意味着,如果我创建一个新的索引列,那么我就无法附加包含数据库中已有记录重复项的*.csv个文件。因此,我无法使用索引列或GROUP BY()
  3. 有没有办法在不使用PK列的情况下删除重复的行?

    用于创建上表的SQL代码:

    CREATE SCHEMA dupl_test;
    
    USE dupl_test;
    
    create table employee (
    id INT AUTO_INCREMENT PRIMARY KEY,
    filename varchar(20),
    firstname varchar(20),
    lastname varchar(20),
    dept varchar(10),
    salary int(10)
    );
    
    insert into employee values(1,'test1.csv','Jan','Thomas','Sales',5000);
    insert into employee values(2,'test1.csv','Jan','Michael','Sales',200);
    insert into employee values(3,'test1.csv','Thomas','John','Technology',12900);
    insert into employee values(4,'test2.csv','Robert','James','Technology',5500);
    insert into employee values(5,'test2.csv','Robert','Albertson','Technology',6000);
    insert into employee values(6,'test2.csv','Mark','Jeffries','Technology',900);
    insert into employee values(7,'test2.csv','Ted','James','Technology',10000);
    insert into employee values(8,'test2.csv','Mayla','Arthurs','Technology',7000);
    insert into employee values(9,'test2.csv','Mayla','Smith','Technology',9500);
    insert into employee values(10,'test3.csv','Mayla','Anthony','Technology',3000);
    insert into employee values(11,'test3.csv','Mayla','Mark','Technology',3000);
    insert into employee values(12,'test4.csv','Mayla','Roberts','Technology',8500);
    insert into employee values(13,'test4.csv','Anthony', 'Anderson','Marketing',9500);
    insert into employee values(14,'test5.csv','Anthony','Smith','Technology',6000);
    insert into employee values(15,'test5.csv','Jan','Thomas','Sales',5000);
    insert into employee values(16,'test5.csv','Jan','Michael','Sales',200);
    insert into employee values(17,'test5.csv','Thomas','John','Technology',12900);
    insert into employee values(18,'test1.csv','Jan','Michael','Sales',8000);
    insert into employee values(19,'test1.csv','Thomas','John','Technology',1540);
    insert into employee values(20,'test2.csv','Mayla','Smith','Technology',10500);
    insert into employee values(21,'test3.csv','Mayla','Anthony','Technology',5600);
    insert into employee values(22,'test4.csv','Anthony', 'Anderson','Marketing',2500);
    insert into employee values(23,'test5.csv','Brian','Earl','HR',1200);
    insert into employee values(24,'test5.csv','John','Smith','HR_Sales',2000);
    insert into employee values(25,'test6.csv','Jan','Thomas','HR_Sales',12000);
    insert into employee values(26,'test6.csv','Jan','Michael','Education',1500);
    insert into employee values(27,'test7.csv','Thomas','John','HR_Sales',1000);
    

1 个答案:

答案 0 :(得分:0)

您可以在MySQL中使用delete join删除重复项:

delete e
    from employee e left join
         (select firstname, lastname, dept, salary, min(filename) as filename
          from employee e
          group by firstname, lastname, dept, salary
         ) tokeep
         on e.firstname = tokeep.firstname and e.lastname = tokeep.lastname and
            e.dept = tokeep.dept and e.salary = tokeep.salary and
            tokeep.filename = e.filename
    where tokeep.filename is null;