我有一张很大的桌子:
CREATE TABLE `messageline` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`hash` bigint(20) DEFAULT NULL,
`quoteLevel` int(11) DEFAULT NULL,
`messageDetails_id` bigint(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `FK2F5B707BF7C835B8` (`messageDetails_id`),
KEY `hash_idx` (`hash`),
KEY `quote_level_idx` (`quoteLevel`),
CONSTRAINT `FK2F5B707BF7C835B8` FOREIGN KEY (`messageDetails_id`) REFERENCES `messagedetails` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=401798068 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
我需要以这种方式找到重复的行:
create table foundline AS
select ml.messagedetails_id, ml.hash, ml.quotelevel
from messageline ml,
messageline ml1
where ml1.hash = ml.hash
and ml1.messagedetails_id!=ml.messagedetails_id
但是这个请求已经工作了> 1天了。这太长了。几个小时就可以了。我怎样才能加快速度呢?感谢名单。
说明:
+----+-------------+-------+------+---------------+----------+---------+---------------+-----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+---------------+-----------+-------------+
| 1 | SIMPLE | ml | ALL | hash_idx | NULL | NULL | NULL | 401798409 | |
| 1 | SIMPLE | ml1 | ref | hash_idx | hash_idx | 9 | skryb.ml.hash | 1 | Using where |
+----+-------------+-------+------+---------------+----------+---------+---------------+-----------+-------------+
答案 0 :(得分:0)
你可以找到像这样的重复项
SELECT messagedetails_id, COUNT(*) c
FROM messageline ml
GROUP BY messagedetails_id HAVING c > 1;
如果仍然太长,请添加条件以在索引字段上拆分请求:
WHERE messagedetails_id < 100000
答案 1 :(得分:0)
是否需要仅使用SQL执行此操作?因为对于如此多的记录,您最好将其分解为两个步骤:
CREATE TABLE duplicate_hashes
SELECT * FROM (
SELECT hash
, GROUP_CONCAT(id
) AS ids, COUNT(*) AS cnt,
COUNT(DISTINCT messagedetails_id) AS cnt_message_details,
GROUP_CONCAT(DISTINCT messagedetails_id) as messagedetails_ids
FROM messageline
GROUP BY hash
ORDER BY NULL HAVING cnt > 1
) tmp
WHERE cnt > cnt_message_details
这将为您提供每个哈希的重复ID,并且由于您在哈希字段上有索引,因此分组将相对较快。现在,通过计算不同的 messagedetails_id 值并比较你隐式满足不同 messagedetails_id 的要求
hash