以微小的分辨率连接日历表上的多个表

时间:2012-08-14 16:50:47

标签: mysql performance join calendar

情景简要概述:

我们在船上有一个数据记录系统,各种传感器正在读取实时数据并将数据存储在MySQL数据库中。

每个传感器都有一个表格,瞬时传感器值带有时间戳并存储在数据库中。

现在的要求是将所有传感器的数据合并到一个表中,其中两个日期时间值之间的每分钟平均值。

以下是我迄今为止所做的事情:

1。创建了一个存储过程来创建日历表。 日历过程创建一个表,其中包含两个指定日期时间值之间的日期时间戳。 对于巡航报告,我在日历表上工作如下:

cal
-------------------+
dt            
-------------------+
2012-07-09 00:00:00
2012-07-09 00:01:00
2012-07-09 00:02:00

... etc

2012-07-29 23:57:00
2012-07-29 23:58:00
2012-07-29 23:59:00
总共30241条记录,在0.016秒内获取,所以没问题。

2。为分钟上平均的传感器值创建临时表。

平均传感器表的示例:

tbl_gyro_hdt_1min_ave
-------------------+------------------
tmstamp            | average_heading
-------------------+------------------
2012-07-09 00:00:00, 135.633333333333
2012-07-09 00:01:00, 135.633333333333
2012-07-09 00:02:00, 136.1
2012-07-09 00:03:00, 135.433333333333
etc...

29546 records fetched in 0.047 secs

和另一个传感器表:

tbl_par_sensor_1min_ave
-------------------+------------------
tmstamp            | average_par
-------------------+------------------
2012-07-09 00:00:00, 16.269949
2012-07-09 00:01:00, 16.270832
2012-07-09 00:02:00, 16.2637752
2012-07-09 00:03:00, 16.2678025
2012-07-09 00:04:00, 16.269324
2012-07-09 00:05:00, 16.2721382
etc...

29543 records fetched in 0.047 secs

3。现在将临时表连接到日历表是轮子脱落的地方。

要将单个表连接到日历表,我这样做:

 SELECT cal.dt, tbl_gyro_hdt_1min_ave.average_heading
    FROM cal

    LEFT JOIN tbl_gyro_hdt_1min_ave
    ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp  

解释上述问题:

+----+---------------+-----------------------+--------+---------------+-------+---------+------+-------+-------------+
| Id |  Select_Type  |  Table                |  Type  | Possible_Keys | Key   | Key_Len | Ref  | Rows  | Extra       |
+----+---------------+-----------------------+--------+---------------+-------+---------+------+-------+-------------+
| 1  |  SIMPLE       | cal                   |  index | NULL          | dt    | 9       | NULL | 30243 | Using index |
| 1  |  SIMPLE       | tbl_gyro_hdt_1min_ave |  ALL   | date_index    | NULL  | NULL    | NULL | 29546 |             |
+----+---------------+-----------------------+--------+---------------+-------+---------+------+-------+-------------+

对于非常小的数据集,这很好用,但对于上面的例子,它只是挂起。 我试图为所有表添加索引,结果相同。

编辑> 我让它在整个数据集中运行一夜。

结果:

获取30243条记录。

持续时间:23.697秒,以3000.352秒获取

下一步是针对日历表加入两个以上的表,如下所示:

 SELECT cal.dt, tbl_par_sensor_1min_ave.average_par, tbl_gyro_hdt_1min_ave.average_heading
    FROM tbl_par_sensor_1min_ave

    LEFT JOIN cal
    ON cal.dt = tbl_par_sensor_1min_ave.tmstamp

    LEFT JOIN tbl_gyro_hdt_1min_ave
    ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp

毫不奇怪,这也是悬而未决。

任何指针都会非常感激。

根据以下评论中的要求,以下是表格模式:

show columns from cal;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| dt    | datetime | YES  | MUL | NULL    |       |
+-------+----------+------+-----+---------+-------+
1 row in set (0.00 sec)


show columns from  tbl_gyro_hdt_1min_ave;
+-----------------+-------------+------+-----+---------+-------+
| Field           | Type        | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| tmstamp         | varchar(24) | YES  | MUL | NULL    |       |
| average_heading | double      | YES  |     | NULL    |       |
+-----------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)


show columns from tbl_par_sensor_1min_ave;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| tmstamp     | varchar(24) | YES  | MUL | NULL    |       |
| average_par | double      | YES  |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

解决:

实施setsuna的更改后:

单外连接:

SELECT cal.dt, tbl_gyro_hdt_1min_ave.average_heading
FROM cal
LEFT JOIN tbl_gyro_hdt_1min_ave
ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp  

Fetched 30243 records 
Duration: 0.015 sec
Fetched in: 0.172 sec

双外连接:

SELECT cal.dt, tbl_gyro_hdt_1min_ave.average_heading, tbl_par_sensor_1min_ave.average_par
FROM cal
LEFT JOIN tbl_gyro_hdt_1min_ave
ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp  
LEFT JOIN tbl_par_sensor_1min_ave
ON cal.dt = tbl_par_sensor_1min_ave.tmstamp  

Fetched 29543 records
Duration: 0.000s
Fetched in: 0.281 sec

2 个答案:

答案 0 :(得分:0)

解决!

感谢setsuna(见评论)

  

将列cal.dt更改为NOT NULL以及将tmstamp更改为   TIMESTAMP或DATETIME并且不为NULL。加入约30,000条记录和   正确索引的JOIN条件字段应该运行得非常快。

实施setsuna的更改后:

单外连接:

SELECT cal.dt, tbl_gyro_hdt_1min_ave.average_heading
FROM cal
LEFT JOIN tbl_gyro_hdt_1min_ave
ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp 

Fetched 30243 records 
Duration: 0.015 sec
Fetched in: 0.172 sec

双外连接:

SELECT cal.dt, tbl_gyro_hdt_1min_ave.average_heading, tbl_par_sensor_1min_ave.average_par
FROM cal
LEFT JOIN tbl_gyro_hdt_1min_ave
ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp  
LEFT JOIN tbl_par_sensor_1min_ave
ON cal.dt = tbl_par_sensor_1min_ave.tmstamp  

Fetched 29543 records
Duration: 0.000s
Fetched in: 0.281 sec

答案 1 :(得分:0)

将列 cal.dt 更改为 NOT NULL 以及将 tmstamp 更改为 TIMESTAMP DATETIME NOT NULL 。具有约30,000条记录和正确索引的JOIN条件字段的JOIN应该运行得非常快。

注意: @Knapie已经给出了这个答案的结果。