我正在尝试从具有重复列值的表中删除记录,但这是永远的。基本上它会卡住而且几个小时都没有响应。我有一个非常大的表,超过130万条记录。查询效率低下吗?任何wat优化它?
delete n1 from ids n1, ids n2 where n1.id > n2.id and n1.user_id = n2.user_id
数据库是远程的,我正在使用putty来运行查询。
答案 0 :(得分:1)
添加索引:
ALTER TABLE ids ADD INDEX (user_id, id);
这样可以有效地查找具有相同用户ID和更高ID的所有行。
它也有助于加入子查询。
DELETE n1
FROM ids AS n1
JOIN (SELECT user_id, MIN(id) AS minid
FROM ids
GROUP BY user_id) AS n2
ON n1.user_id = n2.user_id AND n1.id > n2.minid
使用上述索引,这仍然会更快。
答案 1 :(得分:0)
是的,该查询效率很低。即使你使用了明确的连接,你也要记住基本上每一行" N"正在与#34; N"之前的每一行匹配,并且每一行" N-1"与之前的行匹配。
尝试这样的事情:
DROP TEMPORARY TABLE IF EXISTS keeps;
CREATE TEMPORARY TABLE keeps (
user_id INT,
keepID INT,
INDEX (user_id, keepID)
)
INSERT INTO keeps (user_id, keepID)
SELECT user_id, MIN(id) As keepID
FROM ids
GROUP BY user_id;
DELETE FROM ids WHERE (user_id, id) NOT IN (SELECT user_id, keepID FROM keeps);
DROP TEMPORARY TABLE IF EXISTS keeps;
我也很想建议尝试类似下面的内容,但我不记得MySQL是否允许在删除查询中删除删除表... 这就是为什么我建议使用temp第一个表格。
DELETE a
FROM ids AS a
WHERE EXISTS (
SELECT *
FROM ids AS b
WHERE b.id < a.id
AND b.user_id = a.user_id
)