我正在运行一个相当复杂的SQL语句来从一个大表(3800万行)的原始数据创建一个汇总表。 (我正试图在cache
牌桌上获得当前,低位本赛季,本赛季高位,价格百分比本周/月份/赛季1美分以便稍后查询。)
INSERT INTO cache (`time`, name, price, low, high, week, month, season)
SELECT
MAX(`time`) AS `time`,
name,
MIN(CASE WHEN `time` = 1498511444 THEN price ELSE 999999 END) AS price,
MIN(price) AS low,
MAX(price) AS high,
SUM(CASE WHEN `time` > 1497906644 AND price = 1 THEN 1 ELSE 0 END) / SUM(CASE WHEN `time` > 1497906644 THEN 1 ELSE 0 END) AS week,
SUM(CASE WHEN `time` > 1480367444 AND price = 1 THEN 1 ELSE 0 END) / SUM(CASE WHEN `time` > 1480367444 THEN 1 ELSE 0 END) AS month,
SUM(CASE WHEN `time` > 1493362800 AND price = 1 THEN 1 ELSE 0 END) / SUM(CASE WHEN `time` > 1493362800 THEN 1 ELSE 0 END) AS season
FROM
(SELECT
`time`,
name,
MIN(price) AS price
FROM price
WHERE `time` > 1493362800
GROUP BY `time`, name) AS tmp
GROUP BY name
在price.time列上添加索引后,我设法将其降低到本地的0.6秒(之前需要30秒)。在prod(具有相同的索引)上需要很长时间(30s +)然后使用Errcode失败:28 - 设备上没有剩余空间。如果我在df
运行时看到它,我看到自由空间从大约3MB / s慢慢地从9.9G减少到9.6G。然后几分钟后,空闲空间突然开始下降500MB / s,直到没有剩余空间并且查询失败。在本地,可用空间似乎没有昙花一现,但我想它可能会如此之快,以至于我的df
在一个while循环中没有看到它。
如果我首先尝试创建一个包含子查询结果的表,我也会得到吃磁盘的行为:
INSERT INTO initial_cache (`time`, name, price)
SELECT
`time`,
name,
MIN(price) AS price
FROM price
WHERE `time` > 1493337600
GROUP BY `time`, name
你知道为什么我的查询需要这么多空间来运行吗?为什么它会在prod上表现得如此不同?
谢谢!
答案 0 :(得分:1)
当内存耗尽时,子查询往往会占用大量的临时空间。 然而,有一部分是有点多余的:在初始子查询之后检查时间:重写它给出(其中SUM(1)很奇怪):
INSERT INTO cache (`time`, name, price, low, high, week, month, season)
SELECT
MAX(`time`) AS `time`,
name,
MIN(price) AS price,
MIN(price) AS low,
MAX(price) AS high,
SUM(CASE WHEN price = 1 THEN 1 ELSE 0 END) / SUM(1) AS week,
SUM(CASE WHEN price = 1 THEN 1 ELSE 0 END) / SUM(1) AS month,
SUM(CASE WHEN price = 1 THEN 1 ELSE 0 END) / SUM(1) AS season
FROM
(SELECT
`time`,
name,
MIN(price) AS price
FROM price
WHERE `time` > 1498442022
GROUP BY `time`, name) AS tmp
GROUP BY name;
可能相当于:
INSERT INTO cache (`time`, name, price, low, high, week, month, season)
SELECT
MAX(`time`) AS `time`,
name,
MIN(price) AS price,
MIN(price) AS low,
MAX(price) AS high,
SUM(CASE WHEN price = 1 THEN 1 ELSE 0 END) / SUM(1) AS week,
SUM(CASE WHEN price = 1 THEN 1 ELSE 0 END) / SUM(1) AS month,
SUM(CASE WHEN price = 1 THEN 1 ELSE 0 END) / SUM(1) AS season
FROM price
WHERE `time` > 1498442022
GROUP BY name;
然而,由于外部查询的重写看起来很奇怪,我怀疑这是您正在寻找的结果:提供数据和预期结果以获得更好的答案。
答案 1 :(得分:0)
我没有解决这个问题,但我确实解决了这个问题。我所做的是让插入数据的程序也将数据插入到子查询形成的表中。然后我分别执行我的外部查询。所以我现在有一种两阶段缓存。出于某种原因,这一切都可以工作而不会使磁盘空间凹陷。