情景简要概述:
我们在船上有一个数据记录系统,各种传感器正在读取实时数据并将数据存储在MySQL数据库中。
每个传感器都有一个表格,瞬时传感器值带有时间戳并存储在数据库中。
现在的要求是将所有传感器的数据合并到一个表中,其中两个日期时间值之间的每分钟平均值。
以下是我迄今为止所做的事情:
1。创建了一个存储过程来创建日历表。 日历过程创建一个表,其中包含两个指定日期时间值之间的日期时间戳。 对于巡航报告,我在日历表上工作如下:
cal
-------------------+
dt
-------------------+
2012-07-09 00:00:00
2012-07-09 00:01:00
2012-07-09 00:02:00
... etc
2012-07-29 23:57:00
2012-07-29 23:58:00
2012-07-29 23:59:00
总共30241条记录,在0.016秒内获取,所以没问题。
2。为分钟上平均的传感器值创建临时表。
平均传感器表的示例:
tbl_gyro_hdt_1min_ave
-------------------+------------------
tmstamp | average_heading
-------------------+------------------
2012-07-09 00:00:00, 135.633333333333
2012-07-09 00:01:00, 135.633333333333
2012-07-09 00:02:00, 136.1
2012-07-09 00:03:00, 135.433333333333
etc...
29546 records fetched in 0.047 secs
和另一个传感器表:
tbl_par_sensor_1min_ave
-------------------+------------------
tmstamp | average_par
-------------------+------------------
2012-07-09 00:00:00, 16.269949
2012-07-09 00:01:00, 16.270832
2012-07-09 00:02:00, 16.2637752
2012-07-09 00:03:00, 16.2678025
2012-07-09 00:04:00, 16.269324
2012-07-09 00:05:00, 16.2721382
etc...
29543 records fetched in 0.047 secs
3。现在将临时表连接到日历表是轮子脱落的地方。
要将单个表连接到日历表,我这样做:
SELECT cal.dt, tbl_gyro_hdt_1min_ave.average_heading
FROM cal
LEFT JOIN tbl_gyro_hdt_1min_ave
ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp
解释上述问题:
+----+---------------+-----------------------+--------+---------------+-------+---------+------+-------+-------------+
| Id | Select_Type | Table | Type | Possible_Keys | Key | Key_Len | Ref | Rows | Extra |
+----+---------------+-----------------------+--------+---------------+-------+---------+------+-------+-------------+
| 1 | SIMPLE | cal | index | NULL | dt | 9 | NULL | 30243 | Using index |
| 1 | SIMPLE | tbl_gyro_hdt_1min_ave | ALL | date_index | NULL | NULL | NULL | 29546 | |
+----+---------------+-----------------------+--------+---------------+-------+---------+------+-------+-------------+
对于非常小的数据集,这很好用,但对于上面的例子,它只是挂起。 我试图为所有表添加索引,结果相同。
编辑> 我让它在整个数据集中运行一夜。
结果:
获取30243条记录。
持续时间:23.697秒,以3000.352秒获取
下一步是针对日历表加入两个以上的表,如下所示:
SELECT cal.dt, tbl_par_sensor_1min_ave.average_par, tbl_gyro_hdt_1min_ave.average_heading
FROM tbl_par_sensor_1min_ave
LEFT JOIN cal
ON cal.dt = tbl_par_sensor_1min_ave.tmstamp
LEFT JOIN tbl_gyro_hdt_1min_ave
ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp
毫不奇怪,这也是悬而未决。
任何指针都会非常感激。
根据以下评论中的要求,以下是表格模式:
show columns from cal;
+-------+----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| dt | datetime | YES | MUL | NULL | |
+-------+----------+------+-----+---------+-------+
1 row in set (0.00 sec)
show columns from tbl_gyro_hdt_1min_ave;
+-----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| tmstamp | varchar(24) | YES | MUL | NULL | |
| average_heading | double | YES | | NULL | |
+-----------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
show columns from tbl_par_sensor_1min_ave;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| tmstamp | varchar(24) | YES | MUL | NULL | |
| average_par | double | YES | | NULL | |
+-------------+-------------+------+-----+---------+-------+
2 rows in set (0.00 sec)
解决:
实施setsuna的更改后:
单外连接:
SELECT cal.dt, tbl_gyro_hdt_1min_ave.average_heading
FROM cal
LEFT JOIN tbl_gyro_hdt_1min_ave
ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp
Fetched 30243 records
Duration: 0.015 sec
Fetched in: 0.172 sec
双外连接:
SELECT cal.dt, tbl_gyro_hdt_1min_ave.average_heading, tbl_par_sensor_1min_ave.average_par
FROM cal
LEFT JOIN tbl_gyro_hdt_1min_ave
ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp
LEFT JOIN tbl_par_sensor_1min_ave
ON cal.dt = tbl_par_sensor_1min_ave.tmstamp
Fetched 29543 records
Duration: 0.000s
Fetched in: 0.281 sec
答案 0 :(得分:0)
解决!
感谢setsuna(见评论)
将列cal.dt更改为NOT NULL以及将tmstamp更改为 TIMESTAMP或DATETIME并且不为NULL。加入约30,000条记录和 正确索引的JOIN条件字段应该运行得非常快。
实施setsuna的更改后:
单外连接:
SELECT cal.dt, tbl_gyro_hdt_1min_ave.average_heading
FROM cal
LEFT JOIN tbl_gyro_hdt_1min_ave
ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp
Fetched 30243 records
Duration: 0.015 sec
Fetched in: 0.172 sec
双外连接:
SELECT cal.dt, tbl_gyro_hdt_1min_ave.average_heading, tbl_par_sensor_1min_ave.average_par
FROM cal
LEFT JOIN tbl_gyro_hdt_1min_ave
ON cal.dt = tbl_gyro_hdt_1min_ave.tmstamp
LEFT JOIN tbl_par_sensor_1min_ave
ON cal.dt = tbl_par_sensor_1min_ave.tmstamp
Fetched 29543 records
Duration: 0.000s
Fetched in: 0.281 sec
答案 1 :(得分:0)
将列 cal.dt 更改为 NOT NULL 以及将 tmstamp 更改为 TIMESTAMP 或 DATETIME 和 NOT NULL 。具有约30,000条记录和正确索引的JOIN条件字段的JOIN应该运行得非常快。
注意: @Knapie已经给出了这个答案的结果。