我正在尝试优化以下查询:
SELECT a2 AS 'b_actual_pair',
a1 AS 'c_actual_date',
a3 AS 'd_actual_value',
b1 AS 'e_1m_date',
b3 AS 'f_1m_value',
c1 AS 'g_2m_date',
c3 AS 'h_2m_value',
d1 AS 'i_3m_date',
d3 AS 'j_3m_value',
e1 AS 'k_4m_date',
e3 AS 'l_4m_value',
f1 AS 'm_5m_date',
f3 AS 'n_5m_value'
FROM (SELECT crd.b_date AS 'a1',
crd.c_pair AS 'a2',
crd.d_value AS 'a3'
FROM item_raw_data crd
WHERE crd.a_unique_id > ( (SELECT crd.a_unique_id
FROM item_raw_data crd
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) - ((SELECT
Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) a,
(SELECT crd.b_date AS 'b1',
crd.c_pair AS 'b2',
crd.d_value AS 'b3'
FROM item_raw_data crd
WHERE crd.b_date < ( (SELECT crd.b_date
FROM item_raw_data crd
ORDER BY crd.b_date DESC
LIMIT 0, 1) - INTERVAL 1 minute )
AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id)
FROM (SELECT crd.a_unique_id,
crd.b_date,
crd.c_pair,
crd.d_value
FROM item_raw_data crd
WHERE crd.b_date < (
(SELECT crd.b_date
FROM item_raw_data
crd
ORDER BY crd.b_date
DESC
LIMIT 0, 1) -
INTERVAL 1 minute )
ORDER BY crd.b_date DESC) x) -
((SELECT Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) b,
(SELECT crd.b_date AS 'c1',
crd.c_pair AS 'c2',
crd.d_value AS 'c3'
FROM item_raw_data crd
WHERE crd.b_date < ( (SELECT crd.b_date
FROM item_raw_data crd
ORDER BY crd.b_date DESC
LIMIT 0, 1) - INTERVAL 2 minute )
AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id)
FROM (SELECT crd.a_unique_id,
crd.b_date,
crd.c_pair,
crd.d_value
FROM item_raw_data crd
WHERE crd.b_date < (
(SELECT crd.b_date
FROM item_raw_data
crd
ORDER BY crd.b_date
DESC
LIMIT 0, 1) -
INTERVAL 2 minute )
ORDER BY crd.b_date DESC) x) -
((SELECT Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) c,
(SELECT crd.b_date AS 'd1',
crd.c_pair AS 'd2',
crd.d_value AS 'd3'
FROM item_raw_data crd
WHERE crd.b_date < ( (SELECT crd.b_date
FROM item_raw_data crd
ORDER BY crd.b_date DESC
LIMIT 0, 1) - INTERVAL 3 minute )
AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id)
FROM (SELECT crd.a_unique_id,
crd.b_date,
crd.c_pair,
crd.d_value
FROM item_raw_data crd
WHERE crd.b_date < (
(SELECT crd.b_date
FROM item_raw_data
crd
ORDER BY crd.b_date
DESC
LIMIT 0, 1) -
INTERVAL 3 minute )
ORDER BY crd.b_date DESC) x) -
((SELECT Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) d,
(SELECT crd.b_date AS 'e1',
crd.c_pair AS 'e2',
crd.d_value AS 'e3'
FROM item_raw_data crd
WHERE crd.b_date < ( (SELECT crd.b_date
FROM item_raw_data crd
ORDER BY crd.b_date DESC
LIMIT 0, 1) - INTERVAL 4 minute )
AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id)
FROM (SELECT crd.a_unique_id,
crd.b_date,
crd.c_pair,
crd.d_value
FROM item_raw_data crd
WHERE crd.b_date < (
(SELECT crd.b_date
FROM item_raw_data
crd
ORDER BY crd.b_date
DESC
LIMIT 0, 1) -
INTERVAL 4 minute )
ORDER BY crd.b_date DESC) x) -
((SELECT Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) e,
(SELECT crd.b_date AS 'f1',
crd.c_pair AS 'f2',
crd.d_value AS 'f3'
FROM item_raw_data crd
WHERE crd.b_date < ( (SELECT crd.b_date
FROM item_raw_data crd
ORDER BY crd.b_date DESC
LIMIT 0, 1) - INTERVAL 5 minute )
AND crd.a_unique_id > ( (SELECT Max(x.a_unique_id)
FROM (SELECT crd.a_unique_id,
crd.b_date,
crd.c_pair,
crd.d_value
FROM item_raw_data crd
WHERE crd.b_date < (
(SELECT crd.b_date
FROM item_raw_data
crd
ORDER BY crd.b_date
DESC
LIMIT 0, 1) -
INTERVAL 5 minute )
ORDER BY crd.b_date DESC) x) -
((SELECT Count(DISTINCT c_pair)
FROM item_raw_data)) )
ORDER BY crd.b_date DESC) f
WHERE
a.a2 = b.b2
and
b.b2 = c.c2
and
c.c2 = d.d2
and
d.d2 = e.e2
and
e.e2 = f.f2
此查询的输出如下:
1。 items_raw_data
背后的数据如下:
- 每5秒钟110个项目将以其当前价格
插入数据库中 - actual_pair
或c_pair
是对主表的引用,其中包含完整的项目描述,而不是相关性
- 轻巧地在2秒内插入110行,留下3秒的间隙。这样可以更轻松地构建查询。
2。此查询的目的是使用此数据来生成具有实时价格的图表,但我们需要进一步将查询扩展为(示例)10 minutes, 15 minutes, 1 hour, 2 hours
等...因此您将能够看到物品价格现在价值及时间回溯。
第3。问题是这个查询用2.5秒运行450.000(数小时数据存在,我们需要长达1周的数据)总数据行和6组数据(实际,1m,2m,3m,4m) ,5m)。
到目前为止我们尝试了什么:
使用表格作为内存而非INNO将查询时间从2.5秒减少到2秒。该系统具有64GB ECC RAM和12核CPU以及NVMe驱动器,硬件不应该成为问题。
单独查找每个项目的所有数据会产生比所有项目更糟糕的结果。
通过线程代码在服务器端语言(Java)中完全相同也会更慢。
尝试使用Inner Join&而不是Where,类似的结果。
更容易查看查询:
SELECT crd.b_date AS 'a1',
crd.c_pair AS 'a2',
crd.d_value AS 'a3'
FROM items_raw_data crd
WHERE crd.a_unique_id > ( (SELECT crd.a_unique_id
FROM items_raw_data crd
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) - ((SELECT Count(DISTINCT c_pair)
FROM items_raw_data)) )
ORDER BY crd.b_date DESC
结果:
这仅适用于实际价格。
表格描述:
更新1
补充说明:
更新2
下面是下载数据库的链接,查询也是(自己的托管服务器):
项目原始数据SQL:https://cloud.technorah.com/index.php/s/sR3mdK2Oos2EbC3
SQL查询:https://cloud.technorah.com/index.php/s/bdndmLGAUfpduif
更新3
使用@hunteke查询得到了4.7秒的结果,这非常奇怪,因为查询和建议似乎合乎逻辑。
使用@hunteke的提示我们更改了以下内容:
SELECT a_unique_id FROM item_raw_data ORDER BY a_unique_id DESC LIMIT 0, 1
到
SELECT MAX(a_unique_id) FROM item_raw_data
这将查询时间从2.8秒降低到2.7秒。在主查询上添加USE INDEX(primary)
进一步将时间从2.7秒提高到2.6秒。
更新4
我们在基本任务中失败了,Timestamp
中使用int(11) - primary key
代替order by
。最新插入的日期也是插入的最新唯一ID。因此,从ORDER BY crd.b_date
更改为ORDER BY crd.a_unique_id
会使查询降低超过1秒,从2.6降至1.3,几乎降低一半。
所以实际查询看起来像这样,完全重做。执行时间从1.3秒变为0.55
SELECT *
FROM
(SELECT sub.a_unique_id AS 'a0',
sub.b_date AS 'a1',
sub.c_pair AS 'a2',
sub.d_value AS 'a3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) a,
(SELECT sub.a_unique_id AS 'b0',
sub.b_date AS 'b1',
sub.c_pair AS 'b2',
sub.d_value AS 'b3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
WHERE crd.b_date < (SELECT ( crdx.b_date - INTERVAL 1 minute )
FROM item_raw_data crdx
ORDER BY crdx.a_unique_id DESC
LIMIT 0, 1)
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) b,
(SELECT sub.a_unique_id AS 'c0',
sub.b_date AS 'c1',
sub.c_pair AS 'c2',
sub.d_value AS 'c3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
WHERE crd.b_date < (SELECT ( crdx.b_date - INTERVAL 2 minute )
FROM item_raw_data crdx
ORDER BY crdx.a_unique_id DESC
LIMIT 0, 1)
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) c,
(SELECT sub.a_unique_id AS 'd0',
sub.b_date AS 'd1',
sub.c_pair AS 'd2',
sub.d_value AS 'd3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
WHERE crd.b_date < (SELECT ( crdx.b_date - INTERVAL 3 minute )
FROM item_raw_data crdx
ORDER BY crdx.a_unique_id DESC
LIMIT 0, 1)
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) d,
(SELECT sub.a_unique_id AS 'e0',
sub.b_date AS 'e1',
sub.c_pair AS 'e2',
sub.d_value AS 'e3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
WHERE crd.b_date < (SELECT ( crdx.b_date - INTERVAL 4 minute )
FROM item_raw_data crdx
ORDER BY crdx.a_unique_id DESC
LIMIT 0, 1)
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) e,
(SELECT sub.a_unique_id AS 'f0',
sub.b_date AS 'f1',
sub.c_pair AS 'f2',
sub.d_value AS 'f3'
FROM (SELECT *
FROM item_raw_data) sub,
(SELECT crd.a_unique_id AS 'max_id',
crd.b_date AS 'xdate'
FROM item_raw_data crd
WHERE crd.b_date < (SELECT ( crdx.b_date - INTERVAL 5 minute )
FROM item_raw_data crdx
ORDER BY crdx.a_unique_id DESC
LIMIT 0, 1)
ORDER BY crd.a_unique_id DESC
LIMIT 0, 1) aux
WHERE sub.b_date <= aux.xdate
AND sub.a_unique_id > ( aux.max_id - (SELECT
Count(DISTINCT c_pair) AS
max_rows
FROM item_raw_data)
)
ORDER BY sub.a_unique_id DESC) f
WHERE
a.a2 = b.b2
AND
b.b2 = c.c2
AND
c.c2 = d.d2
AND
d.d2 = e.e2
AND
e.e2 = f.f2
虽然这很好,从1.3s几乎降到0.55s,我们现在可以使用它,但我们仍然在寻找进一步的改进,不仅要产生更好的结果,还要更深入地了解这种优化。大查询和MySQL。 随着表的增长,将继续更新查询执行时间。
答案 0 :(得分:1)
EXPLAIN输出表明优化器选择忽略4个PRIMARY选择的任何索引。我不能肯定地说为什么它不是,但我强烈怀疑它与子查询的大量使用有关:MySQL在优化子查询方面是出了名的,并且一般的建议是“不要”用它们。“并不总是站得住脚。
所以我现在可以提供的改进:
如果您需要最大值,请使用MAX
。不要ORDER BY
和LIMIT
。 ORDER BY
需要对O( n log n )操作进行排序,而MAX
函数的最坏情况只需要O( n < / em>)找到最大值(扫描整个表)。如果您的列具有高基数的索引,则MAX
可能会更少,在O(log n )。你可以通过EXPLAIN输出中的所有文件输出看到这个减速。
-- okay
SELECT a_unique_id FROM item_raw_data ORDER BY a_unique_id DESC LIMIT 0, 1
-- better
SELECT MAX(a_unique_id) FROM item_raw_data
不需要选择子查询中未使用的列。我会假设查询优化器会摆脱它们,但我之前认为MySQL错了。 尤其是,当涉及到子查询时。所以,带着一粒盐:
-- okay
SELECT MAX(a_unique_id)
FROM (SELECT a_unique_id, b_date, c_pair, d_value FROM ...)
-- better
SELECT MAX(a_unique_id)
FROM (SELECT a_unique_id FROM ...)
MySQL优化器忽略了索引。您可以给MySQL一个强有力的提示,使用USE INDEX(<indexname>)
的正确索引。这样做会产生一个更简单的查询计划,也可以更好地利用索引:
+----+-------------+-------+------------+-------+-------------------------+-------------------------+---------+------+--------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+-------+-------------------------+-------------------------+---------+------+--------+----------+----------------------------------------------------+
| 1 | PRIMARY | bb | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 1540 | 33.33 | Using where |
| 1 | PRIMARY | aa | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 110 | 0.90 | Using where; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | cc | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 2860 | 0.30 | Using where; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | dd | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 7568 | 0.30 | Using where; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | ee | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 10188 | 0.30 | Using where; Using join buffer (Block Nested Loop) |
| 1 | PRIMARY | ff | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 12972 | 0.30 | Using where; Using join buffer (Block Nested Loop) |
| 21 | SUBQUERY | ll | NULL | ALL | RAW_DATA_PAIR_UNIQUE_ID | NULL | NULL | NULL | 499198 | 50.00 | Using where |
| 17 | SUBQUERY | kk | NULL | ALL | RAW_DATA_PAIR_UNIQUE_ID | NULL | NULL | NULL | 499198 | 50.00 | Using where |
| 13 | SUBQUERY | jj | NULL | ALL | RAW_DATA_PAIR_UNIQUE_ID | NULL | NULL | NULL | 499198 | 50.00 | Using where |
| 9 | SUBQUERY | ii | NULL | ALL | RAW_DATA_PAIR_UNIQUE_ID | NULL | NULL | NULL | 499198 | 50.00 | Using where |
| 5 | SUBQUERY | hh | NULL | ALL | RAW_DATA_PAIR_UNIQUE_ID | NULL | NULL | NULL | 499198 | 50.00 | Using where |
| 3 | SUBQUERY | gg | NULL | index | RAW_DATA_PAIR_UNIQUE_ID | RAW_DATA_PAIR_UNIQUE_ID | 4 | NULL | 499198 | 100.00 | Using index |
+----+-------------+-------+------------+-------+-------------------------+-------------------------+---------+------+--------+----------+----------------------------------------------------+
最后,我重新组织了查询,因此更明显(如果向右滚动)这些部分的区别(INTERVAL
s)和USE INDEX
的位置是什么。 aa
,bb
,cc
(等)表名称仅用于标识每个部分映射到查询计划的位置(上图):
SELECT
a2 AS 'b_actual_pair', a1 AS 'c_actual_date', a3 AS 'd_actual_value',
b1 AS 'e_1m_date', b3 AS 'f_1m_value',
c1 AS 'g_2m_date', c3 AS 'h_2m_value',
d1 AS 'i_3m_date', d3 AS 'j_3m_value',
e1 AS 'k_4m_date', e3 AS 'l_4m_value',
f1 AS 'm_5m_date', f3 AS 'n_5m_value'
FROM
(SELECT b_date AS a1, c_pair AS a2, d_value AS a3 FROM item_raw_data aa USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data gg USE INDEX(RAW_DATA_PAIR_UNIQUE_ID)) ORDER BY b_date DESC) AS a,
(SELECT b_date AS b1, c_pair AS b2, d_value AS b3 FROM item_raw_data bb USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data hh WHERE b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 1 minute) AND b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 1 minute ORDER BY b_date DESC) AS b,
(SELECT b_date AS c1, c_pair AS c2, d_value AS c3 FROM item_raw_data cc USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data ii WHERE b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 2 minute) AND b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 2 minute ORDER BY b_date DESC) AS c,
(SELECT b_date AS d1, c_pair AS d2, d_value AS d3 FROM item_raw_data dd USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data jj WHERE b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 3 minute) AND b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 3 minute ORDER BY b_date DESC) AS d,
(SELECT b_date AS e1, c_pair AS e2, d_value AS e3 FROM item_raw_data ee USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data kk WHERE b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 4 minute) AND b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 4 minute ORDER BY b_date DESC) AS e,
(SELECT b_date AS f1, c_pair AS f2, d_value AS f3 FROM item_raw_data ff USE INDEX(PRIMARY) WHERE a_unique_id > (SELECT MAX(a_unique_id) - COUNT(DISTINCT c_pair) FROM item_raw_data ll WHERE b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 5 minute) AND b_date < (SELECT MAX(b_date) FROM item_raw_data mm) - INTERVAL 5 minute ORDER BY b_date DESC) AS f
WHERE
a.a2 = b.b2
AND a.a2 = c.c2
AND a.a2 = d.d2
AND a.a2 = e.e2
AND a.a2 = f.f2
当然,更好地使用索引可以提高速度。它不是即时的,但在我的笔记本电脑上占用原始查询的50%的时间。与此同时,请注意,如果您试图满足5周的预算数据,那么您可能仍然会受到冲击。警告您可能需要重新考虑您的方法。
答案 1 :(得分:0)
您可以通过以下方式简化where
子句:
WHERE crd.a_unique_id > (SELECT MAX(crd.a_unique_id) - COUNT(DISTINCT c_pair)
FROM items_raw_data crd
)
这可能对性能有所帮助。您还可以将子查询移动到FROM
子句,以确保它只执行一次。