MySQL:有效地将事件日志转换为时间序列

时间:2013-06-03 14:53:31

标签: mysql subquery time-series sql-optimization

我有一张记录感兴趣事件的开始时间和结束时间的表格:

CREATE TABLE event_log (start_time DATETIME, end_time DATETIME);
INSERT INTO event_log VALUES ("2013-06-03 09:00:00","2013-06-03 09:00:05"), ("2013-06-03 09:00:03","2013-06-03 09:00:07"), ("2013-06-03 09:00:10","2013-06-03 09:00:12");

+---------------------+---------------------+
| start_time          | end_time            |
+---------------------+---------------------+
| 2013-06-03 09:00:00 | 2013-06-03 09:00:05 |
| 2013-06-03 09:00:03 | 2013-06-03 09:00:07 |
| 2013-06-03 09:00:10 | 2013-06-03 09:00:12 |
+---------------------+---------------------+

我正在寻找一种创建“时间序列”表的方法,其中一列是时间索引,另一列是当时正在进行的事件的计数。我可以使用子查询和生成器来完成它:

SET @first_time := (SELECT MIN(start_time) FROM event_log);
SET @last_time := (SELECT MAX(end_time) FROM event_log);

CREATE OR REPLACE VIEW generator_16
AS SELECT 0 n UNION ALL SELECT 1  UNION ALL SELECT 2  UNION ALL 
   SELECT 3   UNION ALL SELECT 4  UNION ALL SELECT 5  UNION ALL
   SELECT 6   UNION ALL SELECT 7  UNION ALL SELECT 8  UNION ALL
   SELECT 9   UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL
   SELECT 12  UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL 
   SELECT 15;

CREATE TABLE time_series (t DATETIME, event_count INT(11))
SELECT @first_time + INTERVAL n SECOND t, NULL AS event_count
  FROM generator_16
  WHERE @first_time + INTERVAL n SECOND <= @last_time;

UPDATE time_series 
  SET event_count= (SELECT COUNT(*) FROM event_log 
  WHERE start_time<=t AND end_time>=t);

+---------------------+-------------+
| t                   | event_count |
+---------------------+-------------+
| 2013-06-03 09:00:00 |           1 |
| 2013-06-03 09:00:01 |           1 |
| 2013-06-03 09:00:02 |           1 |
| 2013-06-03 09:00:03 |           2 |
| 2013-06-03 09:00:04 |           2 |
| 2013-06-03 09:00:05 |           2 |
| 2013-06-03 09:00:06 |           1 |
| 2013-06-03 09:00:07 |           1 |
| 2013-06-03 09:00:08 |           0 |
| 2013-06-03 09:00:09 |           0 |
| 2013-06-03 09:00:10 |           1 |
| 2013-06-03 09:00:11 |           1 |
| 2013-06-03 09:00:12 |           1 |
+---------------------+-------------+

有更有效的方法吗?此方法需要每个时间索引的子查询。例如,是否有一种方法可以实现每个“event_log”记录需要一个子查询?我的真正问题有500k时间索引条目和1k事件;这比我想要的时间长一点(大约90秒)。

“生成器”代码段来自http://use-the-index-luke.com/blog/2011-07-30/mysql-row-generator。显然,较大的问题需要一个较大的发电机,如64k版本或1M版本。

1 个答案:

答案 0 :(得分:0)

唯一的变化发生在start_time和end_time。 所以,如果你要

select distinct start_time As time_point from event_log 
UNION 
select distinct   end_time As time_point from event_log

...这将为您提供需要快照的所有“点”。

如果您在临时表中创建它(比如TEMP_POINTS),并且如果返回到event_log则加入,您应该能够计算每个“点”的事件数。

CREATE TABLE NON_ZERO_POINTS (t DATETIME, event_count INT(11))
    select time_point, count(*)
    from TEMP_POINTS 
    join event_log on time_point between start_time and end_time
    group by time_point

可能值得在NON_ZERO_POINTS上创建索引

然后,您可以在更新中使用NON_ZERO_POINTS:

UPDATE time_series 
SET event_count= (SELECT event_count FROM NON_ZERO_POINTS
WHERE t=time_point);

另外,你需要更新time_series吗?如果没有,您可以在查询中使用它:

select t, coalesce(event_count)
from time_series 
left join FROM NON_ZERO_POINTS
on t=time_point