我有一张带有一些ID +标题的表格。我想让标题栏独一无二,但它已经有超过600k的记录,其中一些是重复的(有时是几十次)。
如何删除除1之外的所有重复项,以便我可以在标题列之后添加UNIQUE键?
答案 0 :(得分:79)
此命令添加唯一键,并删除所有生成错误的行(由于唯一键)。这会删除重复项。
ALTER IGNORE TABLE table ADD UNIQUE KEY idx1(title);
编辑:请注意,对于某些版本的MySQL,此命令may not work for InnoDB tables。有关解决方法,请参阅this post。 (感谢“匿名用户”提供此信息。)
答案 1 :(得分:9)
创建一个只包含原始表的不同行的新表。可能还有其他方法,但我发现这是最干净的。
CREATE TABLE tmp_table AS SELECT DISTINCT [....] FROM main_table
More specifically:
更快的方法是将不同的行插入临时表。使用删除,我花了几个小时从一个800万行的表中删除重复项。使用insert和distinct,只花了13分钟。
CREATE TABLE tempTableName LIKE tableName;
CREATE INDEX ix_all_id ON tableName(cellId,attributeId,entityRowId,value);
INSERT INTO tempTableName(cellId,attributeId,entityRowId,value) SELECT DISTINCT cellId,attributeId,entityRowId,value FROM tableName;
DROP TABLE tableName;
INSERT tableName SELECT * FROM tempTableName;
DROP TABLE tempTableName;
答案 2 :(得分:0)
这显示了如何在SQL2000中执行此操作。我并不完全熟悉MySQL语法,但我确信它有类似的东西
create table #titles (iid int identity (1, 1), title varchar(200))
-- Repeat this step many times to create duplicates
insert into #titles(title) values ('bob')
insert into #titles(title) values ('bob1')
insert into #titles(title) values ('bob2')
insert into #titles(title) values ('bob3')
insert into #titles(title) values ('bob4')
DELETE T FROM
#titles T left join
(
select title, min(iid) as minid from #titles group by title
) D on T.title = D.title and T.iid = D.minid
WHERE D.minid is null
Select * FROM #titles
答案 3 :(得分:0)
delete from student where id in (
SELECT distinct(s1.`student_id`) from student as s1 inner join student as s2
where s1.`sex` = s2.`sex` and
s1.`student_id` > s2.`student_id` and
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)
答案 4 :(得分:0)
Nitin发布的解决方案似乎是最优雅/最合理的解决方案。
然而,它有一个问题:
ERROR 1093(HY000):您无法指定目标表'student' 在FROM子句中更新
然而,这可以通过使用(SELECT * FROM student)而不是student:
来解决DELETE FROM student WHERE id IN (
SELECT distinct(s1.`student_id`) FROM (SELECT * FROM student) AS s1 INNER JOIN (SELECT * FROM student) AS s2
WHERE s1.`sex` = s2.`sex` AND
s1.`student_id` > s2.`student_id` AND
s1.`sex` = 'M'
ORDER BY `s1`.`student_id` ASC
)
将你的+1给Nitin以提出原始解决方案。
答案 5 :(得分:0)
自MySql ALTER IGNORE TABLE
has been deprecated起,您需要在添加索引之前删除重复日期。
首先编写一个查找所有重复项的查询。在这里,我假设email
是包含重复项的字段。
SELECT
s1.email
s1.id,
s1.created
s2.id,
s2.created
FROM
student AS s1
INNER JOIN
student AS s2
WHERE
/* Emails are the same */
s1.email = s2.email AND
/* DON'T select both accounts,
only select the one created later.
The serial id could also be used here */
s2.created > s1.created
;
接下来只选择唯一的重复ID:
SELECT
DISTINCT s2.id
FROM
student AS s1
INNER JOIN
student AS s2
WHERE
s1.email = s2.email AND
s2.created > s1.created
;
一旦确定只包含要删除的重复ID,请运行删除。您必须添加(SELECT * FROM tblname)
以便MySql不会抱怨。
DELETE FROM
student
WHERE
id
IN (
SELECT
DISTINCT s2.id
FROM
(SELECT * FROM student) AS s1
INNER JOIN
(SELECT * FROM student) AS s2
WHERE
s1.email = s2.email AND
s2.created > s1.created
);
然后创建唯一索引:
ALTER TABLE
student
ADD UNIQUE INDEX
idx_student_unique_email(email)
;
答案 6 :(得分:0)
以下查询可用于删除除“id”字段值最低的一行
之外的所有副本DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id > t2.id AND t1.name = t2.name
以类似的方式,我们可以在'id'中保留具有最高值的行,如下所示
DELETE t1 FROM table_name t1, table_name t2 WHERE t1.id < t2.id AND t1.name = t2.name
答案 7 :(得分:0)
删除MySQL表上的重复项是一个常见问题,通常会带来特定需求。如果有人感兴趣,请在这里(Remove duplicate rows in MySQL)解释如何使用临时表以可靠和快速的方式删除MySQL重复项(使用不同用例的示例)。
在这种情况下,这样的事情应该有效:
-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;
-- add a unique constraint
ALTER TABLE tmp_table1 ADD UNIQUE(id, title);
-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;
-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;