这是another question here on SO的后续行动。
我有两个数据库表(省略了更多的表):
acquisitions (acq)
id {PK}
id_cu {FK}
datetime
{ Unique Constraint: id_cu - datetime }
data
id {PK}
id_acq {FK acquisitions}
id_meas
id_elab
value
每个可能的id
和datetime
全部已编入索引。
当然,我将不更改数据库结构我需要以这种方式提取数据:
data.value
组合的每列对应acq.id_cu - data.id_meas - data.id_elab
。 (见帖子底部的注释)我当前的查询以这种方式构建(请参阅SO question):
SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
UNION
SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
UNION
SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
) AS T
WHERE datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
这里仅检索3列,但正如我所说,列通常超过50列。
它完美无缺,但我想知道它是否可以在速度上进行优化。
对于上面的查询,这是MySQL EXPLAIN EXTENDED
:
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 82466 | 100.00 | Using where; Using temporary; Using filesort |
| 2 | DERIVED | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | id_cu | 4 | | 18011 | 100.00 | |
| 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 3 | UNION | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | ix_acquisitions_id_cu | 4 | | 20864 | 100.00 | |
| 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 4 | UNION | acquisitions | ref | PRIMARY,id_cu,ix_acquisitions_id_cu | id_cu | 4 | | 31848 | 100.00 | |
| 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+------+------------------------------------------------+-----------------------+---------+------------------------+-------+----------+----------------------------------------------+
8 rows in set, 1 warning (8.24 sec)
目前有(编辑:今天检查)390k采集和9.2M数据值(并且正在增长)需要大约 10分钟来提取59列的表格。我知道先前的软件需要1个小时来提取数据。
感谢您耐心阅读,直至此处:)
在Denis回答后,我尝试了他的更改1.和2.,这是新查询的结果:
SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acq.datetime AS datetime, data.value AS v1, NULL AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 3 AND data.id_meas = 2 AND data.id_elab = 1
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
UNION ALL
SELECT acq.datetime AS datetime, NULL AS v1, data.value AS v2, NULL AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
UNION ALL
SELECT acq.datetime AS datetime, NULL AS v1, NULL AS v2, data.value AS v3
FROM acq INNER JOIN data ON acq.id = data.id_acq
WHERE acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
) AS T GROUP BY datetime
这里是新的EXPLAIN EXTENDED
:
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 51997 | 100.00 | Using temporary; Using filesort |
| 2 | DERIVED | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 14827 | 100.00 | Using where |
| 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 3 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 18663 | 100.00 | Using where |
| 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 4 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 13260 | 100.00 | Using where |
| 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
8 rows in set, 1 warning (3.01 sec)
毫无疑问,表现良好的表现
这会添加点3.
EXPLAIN EXTENDED SELECT datetime, MAX(v1) AS v1, MAX(v2) AS v2, MAX(v3) AS v3 FROM (
SELECT acquisitions.datetime AS datetime, MAX(data.value) AS v1, NULL AS v2, NULL AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 1 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
UNION ALL
SELECT acquisitions.datetime AS datetime, NULL AS v1, MAX(data.value) AS v2, NULL AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 4 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
UNION ALL
SELECT acquisitions.datetime AS datetime, NULL AS v1, NULL AS v2, MAX(data.value) AS v3
FROM acquisitions INNER JOIN data ON acquisitions.id = data.id_acq
WHERE acquisitions.id_cu = 8 AND data.id_meas = 1 AND data.id_elab = 2
AND datetime >= "2011-03-01 00:00:00" AND datetime <= "2011-04-30 23:59:59"
GROUP BY datetime
) AS T GROUP BY datetime;
这是EXPLAIN EXTENDED
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 51997 | 100.00 | Using temporary; Using filesort |
| 2 | DERIVED | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 14827 | 100.00 | Using where |
| 2 | DERIVED | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 3 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 18663 | 100.00 | Using where |
| 3 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| 4 | UNION | acquisitions | range | PRIMARY,id_cu,ix_acquisitions_datetime,ix_acquisitions_id_cu | id_cu | 12 | NULL | 13260 | 100.00 | Using where |
| 4 | UNION | data | ref | ix_data_id_meas,ix_data_id_acq,ix_data_id_elab | ix_data_id_acq | 4 | sensor.acquisitions.id | 9 | 100.00 | Using where |
| NULL | UNION RESULT | <union2,3,4> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | |
+----+--------------+--------------+-------+--------------------------------------------------------------+----------------+---------+------------------------+-------+----------+---------------------------------+
8 rows in set, 1 warning (3.06 sec)
稍慢一点,这应该是从大量的coulmns中受益吗?我会试试......
我尝试使用和不使用MAX(data.value)... GROUP BY datetime
,在60列查询中,我获得了更好的结果 。结果因尝试而异,这是其中之一。
1.
和2.
4m6.597s 1.
,2.
和3.
4m0.210s 所需时间减少约57%。
我尝试过Andiry解决方案,但它比Denis优化要慢。
测试了3个组合 / columns:
CASE
:9.3s 我还测试了12个组合 / columns:
CASE
:13.7s 此外,Andiry的解决方案还包括收购日期,其中没有任何所选组合的数据,但存在于其他组合。
Immagine控制单元1每隔30分钟在00和30获取数据,而控制单元2在:15和45:我将使用NULL填充空行的行数加倍。
注意:
所有关于传感器系统:有几个控制单元</ strong>(每个id_cu
一个),每个传感器。
单个传感器由id_cu / id_meas
对标识,并为每个度量发送不同的详细说明,例如MIN(id_elab=1
),MAX(id_elab=2
), AVERAGE(id_elab=3
),INSTANT(id_elab=...
)等,每个id_elab
一个。
用户可以自由地接收他想要的许多详细说明,例如:
id_cu=1 / id_meas=3 / id_elab=3
id_cu=1 / id_meas=5 / id_elab=3
id_cu=4 / id_meas=2 / id_elab=1
id_cu, id_meas, id_elab
组合)等等,多达数十种选择......
这是部分DDL(不包括不相关的表格):
CREATE TABLE acquisitions (
id INTEGER NOT NULL AUTO_INCREMENT,
id_cu INTEGER NOT NULL,
datetime DATETIME NOT NULL,
PRIMARY KEY (id),
UNIQUE (id_cu, datetime),
FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)
CREATE TABLE data (
id INTEGER NOT NULL AUTO_INCREMENT,
id_acq INTEGER NOT NULL,
id_meas INTEGER NOT NULL,
id_elab INTEGER NOT NULL,
value FLOAT,
PRIMARY KEY (id),
FOREIGN KEY(id_acq) REFERENCES acquisitions (id) ON DELETE CASCADE
)
CREATE TABLE ctrl_units (
id INTEGER NOT NULL,
name VARCHAR(40) NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE sensors (
id_cu INTEGER NOT NULL,
id_meas INTEGER NOT NULL,
id_elab INTEGER NOT NULL,
name VARCHAR(40) NOT NULL,
`desc` VARCHAR(80),
PRIMARY KEY (id_cu, id_meas),
FOREIGN KEY(id_cu) REFERENCES ctrl_units (id) ON DELETE CASCADE
)
答案 0 :(得分:3)
主要有三个问题:
使用union all,而不是union。您正在对最小/最大值进行分组和提取,因此引入排序步骤以删除重复行没有意义。
where子句可以放在每个联合子语句中:
select ...
from (
select ... from ... where ...
union all
select ... from ... where ...
union all
...
)
group by ...
你编写它的方式,首先是获取所有行,然后将它们全部附加,最后过滤掉你需要的行。在union子语句中注入where子句将使其仅获取所需的行,最后将它们全部附加。
同样,预聚合聚合:
select ..., max(foo) as foo
from (
select ..., max(foo) as foo from ... where ... group by ...
union all
select ..., max(foo) as foo from ... where ... group by ...
union all
...
)
group by ...
优化器将更好地利用现有索引,并且最终只会添加几行,而不是数百万行。
答案 1 :(得分:1)
SELECT
acq.datetime,
MAX(CASE WHEN acq.id_cu = 2 AND data.id_meas = 2 AND data.id_elab = 1 THEN data.value END) AS v1,
MAX(CASE WHEN acq.id_cu = 5 AND data.id_meas = 4 AND data.id_elab = 6 THEN data.value END) AS v2,
MAX(CASE WHEN acq.id_cu = 7 AND data.id_meas = 9 AND data.id_elab = 8 THEN data.value END) AS v3
FROM acq
INNER JOIN data acq.id = data.id_acq
WHERE datetime >= 2011-03-01 00:00:00 AND datetime <= 2011-04-30 23:59:59
GROUP BY acq.datetime
这可能看起来与原始查询大致相同,但主要区别在于逻辑上它只扫描一次表而不是三次或多次使用UNIONs。
答案 2 :(得分:0)
基本上我认为使用单个SELECT和CASE处理条件会得到更好的结果。无论如何,您可能想要进行基准测试和比较......
SELECT acq.datetime AS datetime,
MAX(
CASE acq.id_cu
WHEN 1 THEN data.value
END
) as v1,
MAX(
CASE acq.id_cu
WHEN 4 THEN data.value
END
) as v2,
MAX(
CASE acq.id_cu
WHEN 8 THEN data.value
END
) as v3
FROM
acq INNER JOIN data ON acq.id = data.id_acq
WHERE
data.id_meas = 1 AND data.id_elab = 2 AND
datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59"
这应该进行清洁范围扫描。 此外,复合索引可以做得更多。
最后,使用GROUP BY有什么问题,例如
SELECT data.id_means, acq.datetime AS datetime, MAX(data.value)
FROM
acq INNER JOIN data ON acq.id = data.id_acq
WHERE
data.id_elab = 2 AND
datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
data.id_means IN (1,4,8)
GROUP BY
data.id_means
这是最简单的形式(也是最灵活的) - 即使行没有为您调换行(对于data.id_meas
的不同值)。但是,这将使您最好地了解期望的性能以及哪些索引对查询最有用。
修改强> 要获得* acq.id_cu的最大数据值 - data.id_meas - data.id_elab组合*您应该能够使用
SELECT
acq.id_cu, data.id_meas, data.id_elab, acq.datetime AS datetime, MAX(data.value)
FROM
acq INNER JOIN data ON acq.id = data.id_acq
WHERE
data.id_elab = 2 AND
datetime BETWEEN "2011-03-01 00:00:00" AND "2011-04-30 23:59:59" AND
data.id_means IN (1,4,8)
GROUP BY
acq.id_cu, data.id_meas, data.id_elab, acq.datetime
将为acq.id_cu, data.id_meas, data.id_elab, acq.datetime
的所有组合提供max(data.value)(在过滤后使用其中的值 - 调整影响结果的位置)。
对于没有行的组合,这不会显示NULL,但如果这是适合您的方向,则有一种解决方法。
GROUP BY也确定排序,因此更改group by中的列顺序。
如果我的答案仍然缺失,那么一些样本数据/测试用例会很有用。
你的例子中令人困惑的部分就是当你说
时每列对应data.value 对于选定的acq.id_cu - data.id_meas - data.id_elab组合。
但是当您在示例查询中选择数据时,您可以直接将它们选择为仅具有日期时间分组的列,因此如果它实际上是值的组合,则无法识别哪个行对应于哪个组合(可能有多个行)某某日期)。如果它不是您要过滤/分组的所有值的组合,但确定max值的分组条件直接取决于datetime。