我需要一位SQL大师来帮助我加快查询速度。
我有2张桌子,数量和价格。数量记录2个时间戳之间的数量值,相隔15分钟。价格记录给定时间戳的价格,对于给定的价格类型,每5分钟有一个价格5记录。
我需要计算每个期间的总价格,例如:小时或天,两个时间戳之间。这是通过每个时期内(数量乘以15分钟数量窗口中3个价格的平均值)的总和来计算的。
例如,假设我想看1天内每小时的总价。结果集中每行的总价格值是该小时内四个15分钟时段中每个时段的总价格总和。每个15分钟期间的总价格是通过将该期间的数量值乘以该数量期间的3个价格(每5分钟一个)的平均值计算的。
这是我正在使用的查询,结果是:
SELECT
MIN( `quantities`.`start_timestamp` ) AS `start`,
MAX( `quantities`.`end_timestamp` ) AS `end`,
SUM( `quantities`.`quantity` * (
SELECT AVG( `prices`.`price` )
FROM `prices`
WHERE `prices`.`timestamp` >= `quantities`.`start_timestamp`
AND `prices`.`timestamp` < `quantities`.`end_timestamp`
AND `prices`.`type_id` = 1
) ) AS total
FROM `quantities`
WHERE `quantities`.`start_timestamp` >= '2010-07-01 00:00:00'
AND `quantities`.`start_timestamp` < '2010-07-02 00:00:00'
GROUP BY HOUR( `quantities`.`start_timestamp` );
+---------------------+---------------------+----------+
| start | end | total |
+---------------------+---------------------+----------+
| 2010-07-01 00:00:00 | 2010-07-01 01:00:00 | 0.677733 |
| 2010-07-01 01:00:00 | 2010-07-01 02:00:00 | 0.749133 |
| 2010-07-01 02:00:00 | 2010-07-01 03:00:00 | 0.835467 |
| 2010-07-01 03:00:00 | 2010-07-01 04:00:00 | 0.692233 |
| 2010-07-01 04:00:00 | 2010-07-01 05:00:00 | 0.389533 |
| 2010-07-01 05:00:00 | 2010-07-01 06:00:00 | 0.335300 |
| 2010-07-01 06:00:00 | 2010-07-01 07:00:00 | 1.231467 |
| 2010-07-01 07:00:00 | 2010-07-01 08:00:00 | 0.352800 |
| 2010-07-01 08:00:00 | 2010-07-01 09:00:00 | 1.447200 |
| 2010-07-01 09:00:00 | 2010-07-01 10:00:00 | 0.756733 |
| 2010-07-01 10:00:00 | 2010-07-01 11:00:00 | 0.599467 |
| 2010-07-01 11:00:00 | 2010-07-01 12:00:00 | 1.056467 |
| 2010-07-01 12:00:00 | 2010-07-01 13:00:00 | 1.252600 |
| 2010-07-01 13:00:00 | 2010-07-01 14:00:00 | 1.285567 |
| 2010-07-01 14:00:00 | 2010-07-01 15:00:00 | 0.442933 |
| 2010-07-01 15:00:00 | 2010-07-01 16:00:00 | 0.692567 |
| 2010-07-01 16:00:00 | 2010-07-01 17:00:00 | 1.281067 |
| 2010-07-01 17:00:00 | 2010-07-01 18:00:00 | 0.652033 |
| 2010-07-01 18:00:00 | 2010-07-01 19:00:00 | 1.721900 |
| 2010-07-01 19:00:00 | 2010-07-01 20:00:00 | 1.362400 |
| 2010-07-01 20:00:00 | 2010-07-01 21:00:00 | 1.099300 |
| 2010-07-01 21:00:00 | 2010-07-01 22:00:00 | 0.646267 |
| 2010-07-01 22:00:00 | 2010-07-01 23:00:00 | 0.873100 |
| 2010-07-01 23:00:00 | 2010-07-02 00:00:00 | 0.546533 |
+---------------------+---------------------+----------+
24 rows in set (5.16 sec)
我需要查询比这更快地运行,并且尽管有可能。以下是EXPLAIN EXTENDED ...
的结果+----+--------------------+------------+-------+-------------------+-----------------+---------+-------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+------------+-------+-------------------+-----------------+---------+-------+-------+----------------------------------------------+
| 1 | PRIMARY | quantities | range | start_timestamp | start_timestamp | 8 | NULL | 89 | Using where; Using temporary; Using filesort |
| 2 | DEPENDENT SUBQUERY | prices | ref | timestamp,type_id | type_id | 4 | const | 22930 | Using where |
+----+--------------------+------------+-------+-------------------+-----------------+---------+-------+-------+----------------------------------------------+
2 rows in set, 3 warnings (0.00 sec)
我注意到依赖子查询没有使用密钥中的timestamp字段,而查询正在扫描行的加载。
任何人都可以帮助我让这个运行得更快吗?
以下是创建架构并用大量数据填充(需要2个月)所需的SQL语句
# Create prices table
CREATE TABLE `prices` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`timestamp` datetime NOT NULL,
`type_id` int(11) NOT NULL,
`price` float(8,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `timestamp` (`timestamp`),
KEY `type_id` (`type_id`)
) ENGINE=MyISAM;
# Create quantities table
CREATE TABLE `quantities` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`start_timestamp` datetime NOT NULL,
`end_timestamp` datetime NOT NULL,
`quantity` float(7,2) NOT NULL,
PRIMARY KEY (`id`),
KEY `start_timestamp` (`start_timestamp`),
KEY `end_timestamp` (`end_timestamp`)
) ENGINE=MyISAM;
# Insert first 2 rows into prices, one for each of 2 types, starting 64 days ago
INSERT INTO `prices` (`id`, `timestamp`, `type_id`, `price`) VALUES
(NULL, DATE_SUB(CURDATE(), INTERVAL 64 DAY), '1', RAND()),
(NULL, DATE_SUB(CURDATE(), INTERVAL 64 DAY), '2', RAND());
# Fill the prices table with a record for each type, for every 5 minutes, for the next 64 days
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 32 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 16 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 8 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 4 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 2 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 1 DAY), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 12 HOUR), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 6 HOUR), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 3 HOUR), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 90 MINUTE), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 45 MINUTE), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 20 MINUTE), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 10 MINUTE), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_ADD(`timestamp`, INTERVAL 5 MINUTE), `type_id`, RAND() FROM prices;
INSERT INTO prices (`timestamp`, `type_id`, `price`) SELECT DATE_SUB(`timestamp`, INTERVAL 5 MINUTE), `type_id`, RAND() FROM prices WHERE MOD( (TIME_TO_SEC( `timestamp`) - TIME_TO_SEC(CONCAT(DATE_SUB(CURDATE(), INTERVAL 64 DAY), ' 00:00:00')) ), 45 *60 ) = 0 AND `timestamp` > CONCAT(DATE_SUB(CURDATE(), INTERVAL 64 DAY), ' 00:00:00');
# Insert first row into quantities, start timestamp is 64 days ago, end timestamp is start timestamp plus 15 minutes
INSERT INTO `quantities` (`id`, `start_timestamp`, `end_timestamp`, `quantity`) VALUES (NULL, DATE_SUB(CURDATE(), INTERVAL 64 DAY), DATE_SUB(CURDATE(), INTERVAL '63 23:45' DAY_MINUTE), RAND());
# Fill the quantities table with a record for each 15 minute period for the next 64 days
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 32 DAY), DATE_ADD(`end_timestamp`, INTERVAL 32 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 16 DAY), DATE_ADD(`end_timestamp`, INTERVAL 16 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 8 DAY), DATE_ADD(`end_timestamp`, INTERVAL 8 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 4 DAY), DATE_ADD(`end_timestamp`, INTERVAL 4 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 2 DAY), DATE_ADD(`end_timestamp`, INTERVAL 2 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 1 DAY), DATE_ADD(`end_timestamp`, INTERVAL 1 DAY), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 12 HOUR), DATE_ADD(`end_timestamp`, INTERVAL 12 HOUR), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 6 HOUR), DATE_ADD(`end_timestamp`, INTERVAL 6 HOUR), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 3 HOUR), DATE_ADD(`end_timestamp`, INTERVAL 3 HOUR), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 90 MINUTE), DATE_ADD(`end_timestamp`, INTERVAL 90 MINUTE), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 45 MINUTE), DATE_ADD(`end_timestamp`, INTERVAL 45 MINUTE), RAND() FROM quantities;
INSERT INTO `quantities` (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_ADD(`start_timestamp`, INTERVAL 15 MINUTE), DATE_ADD(`end_timestamp`, INTERVAL 15 MINUTE), RAND() FROM quantities;
INSERT INTO quantities (`start_timestamp`, `end_timestamp`, `quantity`) SELECT DATE_SUB(`start_timestamp`, INTERVAL 15 MINUTE), DATE_SUB(`end_timestamp`, INTERVAL 15 MINUTE), RAND() FROM quantities WHERE MOD( (TIME_TO_SEC( `start_timestamp`) - TIME_TO_SEC(CONCAT(DATE_SUB(CURDATE(), INTERVAL 64 DAY), ' 00:00:00')) ), 45 * 60 ) = 0 AND `start_timestamp` > CONCAT(DATE_SUB(CURDATE(), INTERVAL 64 DAY), ' 00:00:00');
答案 0 :(得分:5)
这是我的第一次尝试。 这个是脏的,并在数据上使用以下属性:
查询1:
SELECT sql_no_cache
min(q.start_timestamp) as start,
max(q.end_timestamp) as end,
sum((p1.price + p2.price + p3.price)/3*q.quantity) as total
FROM
quantities q join
prices p1 on q.start_timestamp = p1.timestamp and p1.type_id = 1 join
prices p2 on p2.timestamp = adddate(q.start_timestamp, interval 5 minute) and p2.type_id = 1 join
prices p3 on p3.timestamp = adddate(q.start_timestamp, interval 10 minute) and p3.type_id = 1
WHERE
q.start_timestamp between '2010-07-01 00:00:00' and '2010-07-01 23:59:59'
GROUP BY hour(q.start_timestamp);
这个在我的慢速测试机上以0.01秒的速度返回结果,其中原始查询在~6秒内运行,gnarf在〜0.85秒内查询(所有查询总是使用SQL_NO_CACHE
关键字进行测试,不会重复使用结果,但在温暖的数据库上)。
编辑: 这是一个对价格方面的缺失行不敏感的版本 查询1a
SELECT sql_no_cache
min(q.start_timestamp) as start,
max(q.end_timestamp) as end,
sum( ( COALESCE(p1.price,0) + COALESCE(p2.price,0) + COALESCE(p3.price,0) ) / (
3 -
COALESCE(p1.price-p1.price,1) -
COALESCE(p2.price-p2.price,1) -
COALESCE(p3.price-p3.price,1)
)
*q.quantity) as total
FROM
quantities q LEFT JOIN
prices p1 on q.start_timestamp = p1.timestamp and p1.type_id = 1 LEFT JOIN
prices p2 on p2.timestamp = adddate(q.start_timestamp, interval 5 minute) and p2.type_id = 1 LEFT JOIN
prices p3 on p3.timestamp = adddate(q.start_timestamp, interval 10 minute) and p3.type_id = 1
WHERE
q.start_timestamp between '2010-07-01 00:00:00' and '2010-07-01 23:59:59'
GROUP BY hour(q.start_timestamp);
EDIT2: 查询2: 这是对您的查询的直接改进和不同的方法,只需要很少的更改,在我的机器上执行时间约为0.22秒
SELECT sql_no_cache
MIN( `quantities`.`start_timestamp` ) AS `start`,
MAX( `quantities`.`end_timestamp` ) AS `end`,
SUM( `quantities`.`quantity` * (
SELECT AVG( `prices`.`price` )
FROM `prices`
WHERE
`prices`.`timestamp` >= '2010-07-01 00:00:00'
AND `prices`.`timestamp` < '2010-07-02 00:00:00'
AND `prices`.`timestamp` >= `quantities`.`start_timestamp`
AND `prices`.`timestamp` < `quantities`.`end_timestamp`
AND `prices`.`type_id` = 1
) ) AS total
FROM `quantities`
WHERE `quantities`.`start_timestamp` >= '2010-07-01 00:00:00'
AND `quantities`.`start_timestamp` < '2010-07-02 00:00:00'
GROUP BY HOUR( `quantities`.`start_timestamp` );
那就是mysql 5.1,我想我已经读过5.5,这种东西(合并索引)将可用于查询规划器。此外,如果您可以通过外键使start_timestamp和timestamp相关,这应允许这些相关查询使用索引(但为此您需要修改设计并建立某种时间轴表,然后可以引用它按数量和价格计算。)
查询3: 最后,最后一个版本在~0.03秒内完成,但应该像Query 2一样健壮且灵活。
SELECT sql_no_cache
MIN(start),
MAX(end),
SUM(subtotal)
FROM
(
SELECT sql_no_cache
q.`start_timestamp` AS `start`,
q.`end_timestamp` AS `end`,
AVG(p.`price` * q.`quantity`) AS `subtotal`
FROM `quantities` q
LEFT JOIN `prices` p ON p.timestamp >= q.start_timestamp AND
p.timestamp < q.end_timestamp AND
p.timestamp >= '2010-07-01 00:00:00' AND
p.`timestamp` < '2010-07-02 00:00:00'
WHERE q.`start_timestamp` >= '2010-07-01 00:00:00'
AND q.`start_timestamp` < '2010-07-02 00:00:00'
AND p.type_id = 1
GROUP BY q.`start_timestamp`
) forced_tmp
GROUP BY hour( start );
注意:不要忘记在生产中删除 sql_no_cache 关键字。
在上述查询中应用了许多反直觉技巧(有时在连接条件中重复的条件会加快查询速度,有时会降低它们的速度)。 Mysql是一个很棒的小RDBMS,并且在涉及相对简单的查询时非常快,但是当复杂性增加时,很容易遇到上述情况。
所以一般来说,我应用以下原则来设定我对查询性能的期望:
在这种特殊情况下,您从少于1000行开始(一天中的所有价格和数量,精确度为15分钟),然后您应该能够计算最终结果。
答案 1 :(得分:2)
这应该返回相同的结果,并且执行速度稍快:
SELECT
MIN( `quantities`.`start_timestamp` ) AS `start`,
MAX( `quantities`.`end_timestamp` ) AS `end`,
SUM( `quantities`.`quantity` * `prices`.`price` )
* COUNT(DISTINCT `quantities`.`id`)
/ COUNT(DISTINCT `prices`.`id`)
AS total
FROM `quantities`
JOIN `prices` ON `prices`.`timestamp` >= `quantities`.`start_timestamp`
AND `prices`.`timestamp` < `quantities`.`end_timestamp`
AND `prices`.`type_id` = 1
WHERE `quantities`.`start_timestamp` >= '2010-07-01 00:00:00'
AND `quantities`.`start_timestamp` < '2010-07-02 00:00:00'
GROUP BY HOUR( `quantities`.`start_timestamp` );
由于您无法在AVG()
内计算SUM()
,我必须做一些有趣的COUNT(DISTINCT)
来计算每prices
返回quantities
的数量。我想知道这是否会给你带来与“真实”数据相同的结果......
使用JOIN
:
+----+-------------+------------+-------+-------------------------------+-----------------+---------+------+-------+----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+-------+-------------------------------+-----------------+---------+------+-------+----------+----------------------------------------------+ | 1 | SIMPLE | quantities | range | start_timestamp,end_timestamp | start_timestamp | 8 | NULL | 89 | 100.00 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | prices | ALL | timestamp,type_id | NULL | NULL | NULL | 36862 | 62.20 | Using where; Using join buffer | +----+-------------+------------+-------+-------------------------------+-----------------+---------+------+-------+----------+----------------------------------------------+
VS。同一查询仅将LEFT
添加到JOIN
+----+-------------+------------+-------+-------------------+-----------------+---------+-------+-------+----------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+-------+-------------------+-----------------+---------+-------+-------+----------+----------------------------------------------+ | 1 | SIMPLE | quantities | range | start_timestamp | start_timestamp | 8 | NULL | 89 | 100.00 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | prices | ref | timestamp,type_id | type_id | 4 | const | 22930 | 100.00 | | +----+-------------+------------+-------+-------------------+-----------------+---------+-------+-------+----------+----------------------------------------------+
有趣的是,LEFT
可以完全删除end_timestamp
作为可能的密钥,并且更改所选的密钥,使其花费的时间长达15倍......
This reference page可以帮助您多一点
答案 2 :(得分:0)
我不知道它是否更快,但试试这个:
SELECT
MIN( `quantities`.`start_timestamp` ) AS `start`,
MAX( `quantities`.`end_timestamp` ) AS `end`,
( `quantities`.`quantity` * AVG (`prices`.`price`) * COUNT (`prices`.`price`)) AS `total`
FROM `quantities`
LEFT JOIN `prices`
ON `prices`.`timestamp` >= `quantities`.`start_timestamp`
AND `prices`.`timestamp` < `quantities`.`end_timestamp`
WHERE `quantities`.`start_timestamp` >= '2010-07-01 00:00:00'
AND `quantities`.`start_timestamp` < '2010-07-02 00:00:00'
AND `prices`.`type_id` = 1
GROUP BY HOUR( `quantities`.`start_timestamp` );
同时比较结果,因为逻辑略有不同。
我不做SUM(quantety * AVG(价格)
我做AVG(价格)* COUNT(价格)* quantety
答案 3 :(得分:0)
请记住,仅仅因为列上有索引并不一定意味着它们会更快地运行。就目前而言,创建的索引是针对每个单独的列,如果您只限制一列上的数据,则会很快返回结果。
因此,为了避免“使用filesort”(您需要尽可能多地做),可以尝试以下索引:
CREATE INDEX start_timestamp_end_timestamp_id ON quantities (start_timestamp,end_timestamp,id);
价格表的相似之处(将您拥有的3个单独的索引合并为1个索引以便更快地查找)
一个很好的资源,它非常详细地解释了它,以及如何优化索引(以及不同的Explain的含义和目标):http://hackmysql.com/case1