有没有办法加快这个查询?

时间:2014-04-03 22:31:56

标签: mysql sql

我怀疑有一种方法可以让它更快,但它超出了我的MySQL限制。

我有一张表,其中包含从某些传感器收集的数据,以活动为基础的1Hz的速率。表列是activityId,transducerId(数据来自哪个传感器),传感器报告的值以及时间戳。给定的活动可以有0-24个传感器。

一秒的数据看起来像这样(根据传感器的数量给出或取出行):   enter image description here

我需要获取一个新表,其中包含为每个传感器命名的列,其中包含该传感器的数据和datetime列。例如:

enter image description here

目前我正在使用一系列很长的查询和连接来获取此表。这是我正在使用的查询:

SELECT cd.calculatedValue AS `301`, q1.`302` , q2.`303` , q3.`304` , q4.`305` , q5.`306` , q6.`307` , q7.`308` , q8.`309` , q9.`310` , q10.`311` , q11.`312` , q12.`313` , q13.`314` , cd.`datetime` 
FROM 
data cd 
JOIN 
(SELECT `calculatedValue` AS `302`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 302) AS q1 
ON cd.`datetime` = q1.`datetime` 
JOIN 
(SELECT `calculatedValue` AS `303`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 303) AS q2 
ON cd.`datetime` = q2.`datetime` 
JOIN 
(SELECT `calculatedValue` AS `304`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 304) AS q3 
ON cd.`datetime` = q3.`datetime` 
JOIN 
(SELECT `calculatedValue` AS `305`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 305) AS q4 
ON cd.`datetime` = q4.`datetime` 
JOIN 
(SELECT `calculatedValue` AS `306`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 306) AS q5 
ON cd.`datetime` = q5.`datetime` 
JOIN 
(SELECT `calculatedValue` AS `307`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 307) AS q6 
ON cd.`datetime` = q6.`datetime` 
JOIN 
(SELECT `calculatedValue` AS `308`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 308) AS q7 
ON cd.`datetime` = q7.`datetime` 
JOIN 
(SELECT `calculatedValue` AS `309`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 309) AS q8 
ON cd.`datetime` = q8.`datetime`
 JOIN 
 (SELECT `calculatedValue` AS `310`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 310) AS q9 
 ON cd.`datetime` = q9.`datetime` 
 JOIN 
 (SELECT `calculatedValue` AS `311`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 311) AS q10 
 ON cd.`datetime` = q10.`datetime` 
 JOIN 
 (SELECT `calculatedValue` AS `312`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 312) AS q11 
 ON cd.`datetime` = q11.`datetime` 
 JOIN 
 (SELECT `calculatedValue` AS `313`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 313) AS q12 
 ON cd.`datetime` = q12.`datetime` 
 JOIN 
 (SELECT `calculatedValue` AS `314`, `datetime` FROM `data` WHERE `activityId` = 74 AND `transducerId` = 314) AS q13 
 ON cd.`datetime` = q13.`datetime` 
 WHERE cd.`activityId` = 74 AND cd.`transducerId` = 301

只需几分钟的数据就需要很长时间,实际上表中会有数小时的数据,以及多达10个传感器。

有更好的方法来进行此查询吗?

非常感谢。

1 个答案:

答案 0 :(得分:1)

这些派生表将在性能方面与您的午餐盒一起吃午餐。这些内联视图查询运行,并实现为临时MyISAM表,然后外部查询引用未编制索引的临时MyISAM表来执行所有连接操作。

作为替代方案,考虑在表格中仅使用一次,以获得几乎相同的结果。 (在您的查询中,如果任何传感器的日期时间的行“缺失”,则不返回任何行。

考虑使用GROUP BY操作,MySQL可以使用适当的索引进行优化。

举个例子,像这样:

SELECT d.datetime
     , MAX(IF(d.transducerId = 301,d.calculatedValue,NULL)) AS `301`
     , MAX(IF(d.transducerId = 302,d.calculatedValue,NULL)) AS `302`
     , MAX(IF(d.transducerId = 302,d.calculatedValue,NULL)) AS `302`
     , MAX(IF(d.transducerId = 303,d.calculatedValue,NULL)) AS `303`
     , MAX(IF(d.transducerId = 304,d.calculatedValue,NULL)) AS `304`
     , MAX(IF(d.transducerId = 305,d.calculatedValue,NULL)) AS `305`
     , MAX(IF(d.transducerId = 305,d.calculatedValue,NULL)) AS `306`
     , MAX(IF(d.transducerId = 305,d.calculatedValue,NULL)) AS `307`
     , MAX(IF(d.transducerId = 305,d.calculatedValue,NULL)) AS `308`
     , MAX(IF(d.transducerId = 305,d.calculatedValue,NULL)) AS `309`
     , MAX(IF(d.transducerId = 305,d.calculatedValue,NULL)) AS `310`
     , MAX(IF(d.transducerId = 305,d.calculatedValue,NULL)) AS `311`
     , MAX(IF(d.transducerId = 305,d.calculatedValue,NULL)) AS `312`
     , MAX(IF(d.transducerId = 305,d.calculatedValue,NULL)) AS `313`
     , MAX(IF(d.transducerId = 305,d.calculatedValue,NULL)) AS `314`
  FROM `data` d
 WHERE d.activityId = 74 
 GROUP BY d.datetime

(您可以将d.datetime移动到SELECT列表的末尾,我通常首先使用GROUP BY列。)

如果没有合适的索引可用,这个查询会像一辆笨重的货运列车一样疯狂地冒烟。

此查询最合适的索引可能是

(activityID,datetime,transducerId,calculatedValue)

如果这是一个InnoDB表,并且群集密钥中的前导列是(activityID,datetime),那就足够了。

理想情况下,此查询的EXPLAIN输出显示Extra列中的“Using where; using index”。我们绝对不希望在EXPLAIN中看到的是“使用filesort”操作,或任何派生表,我们可以帮助它。


此查询与原始查询略有不同;如果特定日期时某个传感器的行“缺失”,则此查询将返回该日期时间的行,但“缺失”传感器的值为NULL,原始查询将省略整行。


如果您确实想要使用JOIN操作,那么不使用内联视图的等效项将比原始视图更有效,但可能不如GROUP BY查询效率高(在我的回答中)上文)。

SELECT cd301.datetime
     , cd301.calculatedValue AS `301`
     , cd302.calculatedValue AS `302`
     , cd303.calculatedValue AS `303`
     , cd304.calculatedValue AS `304`
     , cd305.calculatedValue AS `305`
     , cd306.calculatedValue AS `306`
--     , cd307.calculatedValue AS `307`
--     ...
--     , cd314.calculatedValue AS `314`
  FROM `data` cd301
  JOIN `data` cd302
    ON cd302.activityId   = cd301.activityId
   AND cd302.datetime     = cd301.datetime
   AND cd302.transducerId = 302
  JOIN `data` cd303
    ON cd303.activityId   = cd301.activityId
   AND cd303.datetime     = cd301.datetime
   AND cd303.transducerId = 303
  JOIN `data` cd304
    ON cd304.activityId   = cd301.activityId
   AND cd304.datetime     = cd301.datetime
   AND cd304.transducerId = 304
  JOIN `data` cd305
    ON cd305.activityId   = cd301.activityId
   AND cd305.datetime     = cd301.datetime
   AND cd305.transducerId = 305
  JOIN `data` cd306
    ON cd306.activityId   = cd301.activityId
   AND cd306.datetime     = cd301.datetime
   AND cd306.transducerId = 306
 WHERE cd301.transducerId = 301

显然,需要扩展以获得307,308,...... 314遵循相同的模式。

同样,这种JOIN方法可能与GROUP BY相当,甚至更快,尽管与单行GROUP BY计划相比,EXPLAIN将拥有更多的行。