Mysql删除重复的评论?

时间:2015-02-01 18:45:24

标签: mysql duplicate-removal

我想清理评论表(1M行)上的重复内容,其中用户已发布两次(或更多)相同的评论。但是我想保留一个重复评论的实例。

以下是我提出的查询,用于查找和分组这些注释:

SELECT author, body, COUNT(*) as count
FROM  db.comment
GROUP BY body
HAVING COUNT(*) > 1;

但是不知道如何删除重复的行而只留下一个未触动过的行。 我见过类似的问题,但没有一个对我有用。非常感谢您的提示。

更新:

mysql> describe comment;
+---------+-------------+------+-----+---------+----------------+
| Field   | Type        | Null | Key | Default | Extra          |
+---------+-------------+------+-----+---------+----------------+
| id      | int(11)     | NO   | PRI | NULL    | auto_increment |
| created | datetime    | NO   |     | NULL    |                |
| author  | varchar(60) | NO   |     | NULL    |                |
| body    | longtext    | NO   |     | NULL    |                |
| post_id | int(11)     | NO   | MUL | NULL    |                |
+---------+-------------+------+-----+---------+----------------+

2 个答案:

答案 0 :(得分:1)

与其他DBMS不同,MySQL可以从表中选择所有字段,但只能通过其中一个进行分组。在这种情况下,只会选择每组的第一条记录。

分两步完成工作:

保存ID以保留在临时表中:

INSERT INTO temp_comment(id)
SELECT id
FROM db.comment
GROUP BY author, body

删除除已保存的行以外的所有行:

DELETE FROM db.comment WHERE id NOT IN (SELECT id FROM temp_comment);

当然,您需要temp_comment表存在。

答案 1 :(得分:1)

这是你想要的吗?

SELECT * FROM comments WHERE id NOT IN (
  SELECT id
  FROM  comments
  GROUP BY author,body
  HAVING COUNT(*) > 1
 )
AND author IN(
  SELECT author
  FROM  comments
  GROUP BY author,body
  HAVING COUNT(*) > 1
  )
AND body IN(
  SELECT body
  FROM  comments
  GROUP BY author,body
  HAVING COUNT(*) > 1
  );

delete重复的行,请将SELECT *更改为DELETE

SQL Fiddle Demo

<强> 更新

要提高查询性能,可以尝试以下方法:

SELECT * FROM comments c
INNER JOIN 
(
  SELECT id,author,body
  FROM  comments
  GROUP BY author,body
  HAVING COUNT(*) > 1
 ) AS t
ON c.id NOT IN(t.id) AND c.author IN(t.author) AND c.body IN(t.body)