我有一个非常大的MySQL表(6800万行),
我正在尝试使用以下查询每分钟保留一行:
delete bt
from table1 bt
join (select date, min(time) as time
from table1
group by date, hour(time), minute(time)
)
btt
on btt.date = bt.date
and hour(bt.time) = hour(btt.time)
and minute(bt.time) = minute(btt.time)
and btt.time <> bt.time
我的表格包含以下列
+----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| date | varchar(11) | NO | | NULL | |
| time | varchar(12) | NO | | NULL | |
| gmt_offset | varchar(2) | YES | | NULL | |
| type | varchar(10) | YES | | NULL | |
| yield_b | varchar(10) | YES | | NULL | |
| yield_d | varchar(10) | YES | | NULL | |
+----------------+-------------+------+-----+---------+----------------+
查询运行超过24小时,当我运行
时SHOW FULL PROCESSLIST;
国家说
Creating sort index
此类查询花费这么长时间是否正常?有什么方法可以加快速度吗?谢谢!
编辑:
戈登的回答是正确的,只有一行中有一个小错误。这是正确的查询确实比前一个查询更快:create table temp_table1 as
select t.*
from (select t1.*,
(@rn := if(@prevd = date and minute(time) = @prevm, @rn + 1,
if(@prevd := date, if(@prevm := minute(time), 1, 1), 1)
)
) as seqnum
from table1 t1 cross join
(select @rn := 0, @prevd := 0, @prevm := 0) vars
order by date, time
) t
where seqnum = 1;
答案 0 :(得分:2)
首先,抓住每个不同时刻的第一次观察的id
值。
SELECT MIN(id) As first_id_in_minute
FROM table1
GROUP BY date, HOUR(time), MINUTE(time)
这些是您要保留的行的id
值。
然后删除其余行。使用LEFT JOIN ... IS NULL
获取不匹配的行。这可能比IS NOT IN(...)
更快。
DELETE a
FROM table1 AS a
LEFT JOIN (
SELECT MIN(id) As first_id_in_minute
FROM table1
GROUP BY date, HOUR(time), MINUTE(time)
) AS b ON a.id = b.first_id_in_minute
WHERE b.first_id_in_minute IS NULL
LIMIT 1000
我放入LIMIT 1000以减少每个DELETE操作的大小。您应该重复此查询,直到它声明没有行受到影响。
尝试在(date, time, id)
上添加复合索引,以加速MIN() ... GROUP BY
部分。
就像戈登建议的那样,在桌子的副本上试试这个,嗯?
答案 1 :(得分:1)
不是删除一堆行,而是创建一个包含所需数据的临时表,然后截断原始表并将其重新插入:
create table temp_table1 as
select t.*
from (select t1.*,
(@rn := if(@prevd <> date or minute(time) <> @prevm, 1,
if(@prevd := date, if(@prevm := minute(time), 1, 1), 1)
)
) as seqnum
from table1 t1 cross join
(select @rn := 0, @prevd := 0, @prevm := 0) vars
order by date, time
) t
where seqnum = 1;
truncate table table1;
insert into table1(col1, . . ., coln)
select col1, . . . , coln
from temp_table1;
第一个查询有一个子查询,用于枚举一分钟内的行。然后,只保留第一个。然后将其插入到表的空版本中。当然,在截断原始表之前测试第一个查询的结果(我会将数据复制到其他地方,以防万一)。