我在MySQL数据库中有这个表。
1 test1.csv Jan Thomas Sales 5000
2 test1.csv Jan Michael Sales 200
3 test1.csv Thomas John Technology 12900
4 test2.csv Robert James Technology 5500
5 test2.csv Robert Albertson Technology 6000
6 test2.csv Mark Jeffries Technology 900
7 test2.csv Ted James Technology 10000
8 test2.csv Mayla Arthurs Technology 7000
9 test2.csv Mayla Smith Technology 9500
10 test3.csv Mayla Anthony Technology 3000
11 test3.csv Mayla Mark Technology 3000
12 test4.csv Mayla Roberts Technology 8500
13 test4.csv Anthony Anderson Marketing 9500
14 test5.csv Anthony Smith Technology 6000
15 test5.csv Jan Thomas Sales 5000
16 test5.csv Jan Michael Sales 200
17 test5.csv Thomas John Technology 12900
18 test1.csv Jan Michael Sales 8000
19 test1.csv Thomas John Technology 1540
20 test2.csv Mayla Smith Technology 10500
21 test3.csv Mayla Anthony Technology 5600
22 test4.csv Anthony Anderson Marketing 2500
23 test5.csv Brian Earl HR 1200
24 test5.csv John Smith HR_Sales 2000
25 test6.csv Jan Thomas HR_Sales 12000
26 test6.csv Jan Michael Education 1500
27 test7.csv Thomas John HR_Sales 1000
创建表的SQL代码在本文末尾。每条记录包括文件名,名字,姓氏,部门,工资。有时,多个文件中存在相同的记录 - 我不能拥有这些重复的记录。
如你所见: id = 15,16,17分别是id = 1,2,3的重复。
我需要删除文件名不同的重复项,但记录是相同的。
其他信息:
DELETE FROM employee WHERE id IN (15, 16, 17)
因为
我不知道哪些行会被复制。*.csv
个文件,该表不断更新。这意味着,如果我创建一个新的索引列,那么我就无法附加包含数据库中已有记录重复项的*.csv
个文件。因此,我无法使用索引列或GROUP BY()
。有没有办法在不使用PK列的情况下删除重复的行?
用于创建上表的SQL代码:
CREATE SCHEMA dupl_test;
USE dupl_test;
create table employee (
id INT AUTO_INCREMENT PRIMARY KEY,
filename varchar(20),
firstname varchar(20),
lastname varchar(20),
dept varchar(10),
salary int(10)
);
insert into employee values(1,'test1.csv','Jan','Thomas','Sales',5000);
insert into employee values(2,'test1.csv','Jan','Michael','Sales',200);
insert into employee values(3,'test1.csv','Thomas','John','Technology',12900);
insert into employee values(4,'test2.csv','Robert','James','Technology',5500);
insert into employee values(5,'test2.csv','Robert','Albertson','Technology',6000);
insert into employee values(6,'test2.csv','Mark','Jeffries','Technology',900);
insert into employee values(7,'test2.csv','Ted','James','Technology',10000);
insert into employee values(8,'test2.csv','Mayla','Arthurs','Technology',7000);
insert into employee values(9,'test2.csv','Mayla','Smith','Technology',9500);
insert into employee values(10,'test3.csv','Mayla','Anthony','Technology',3000);
insert into employee values(11,'test3.csv','Mayla','Mark','Technology',3000);
insert into employee values(12,'test4.csv','Mayla','Roberts','Technology',8500);
insert into employee values(13,'test4.csv','Anthony', 'Anderson','Marketing',9500);
insert into employee values(14,'test5.csv','Anthony','Smith','Technology',6000);
insert into employee values(15,'test5.csv','Jan','Thomas','Sales',5000);
insert into employee values(16,'test5.csv','Jan','Michael','Sales',200);
insert into employee values(17,'test5.csv','Thomas','John','Technology',12900);
insert into employee values(18,'test1.csv','Jan','Michael','Sales',8000);
insert into employee values(19,'test1.csv','Thomas','John','Technology',1540);
insert into employee values(20,'test2.csv','Mayla','Smith','Technology',10500);
insert into employee values(21,'test3.csv','Mayla','Anthony','Technology',5600);
insert into employee values(22,'test4.csv','Anthony', 'Anderson','Marketing',2500);
insert into employee values(23,'test5.csv','Brian','Earl','HR',1200);
insert into employee values(24,'test5.csv','John','Smith','HR_Sales',2000);
insert into employee values(25,'test6.csv','Jan','Thomas','HR_Sales',12000);
insert into employee values(26,'test6.csv','Jan','Michael','Education',1500);
insert into employee values(27,'test7.csv','Thomas','John','HR_Sales',1000);
答案 0 :(得分:0)
您可以在MySQL中使用delete
join
删除重复项:
delete e
from employee e left join
(select firstname, lastname, dept, salary, min(filename) as filename
from employee e
group by firstname, lastname, dept, salary
) tokeep
on e.firstname = tokeep.firstname and e.lastname = tokeep.lastname and
e.dept = tokeep.dept and e.salary = tokeep.salary and
tokeep.filename = e.filename
where tokeep.filename is null;