MySQL计算具有相似时间戳的行

时间:2013-05-17 20:12:22

标签: mysql

无论如何计算一组彼此接近的时间戳,但不一定是在固定的时间范围内?

即,不按小时或分钟分组,而是按当前行的时间戳与下一行的时间戳的接近程度进行分组。如果下一行在“x”秒/分钟内,则将该行添加到组中,否则开始新的分组。

鉴于此数据:

+----+---------+---------------------+
| id | item_id | event_date          |
+----+---------+---------------------+
|  1 |       1 | 2013-05-17 11:59:59 |
|  2 |       1 | 2013-05-17 12:00:00 |
|  3 |       1 | 2013-05-17 12:00:02 |
|  4 |       1 | 2013-05-17 12:00:03 |
|  5 |       3 | 2013-05-17 14:05:00 |
|  6 |       3 | 2013-05-17 14:05:01 |
|  7 |       3 | 2013-05-17 15:30:00 |
|  8 |       3 | 2013-05-17 15:30:01 |
|  9 |       3 | 2013-05-17 15:30:02 |
| 10 |       1 | 2013-05-18 09:12:00 |
| 11 |       1 | 2013-05-18 09:13:30 |
| 12 |       1 | 2013-05-18 09:13:45 |
| 13 |       1 | 2013-05-18 09:14:00 |
| 14 |       2 | 2013-05-20 15:45:00 |
| 15 |       2 | 2013-05-20 15:45:03 |
| 16 |       2 | 2013-05-20 15:45:10 |
| 17 |       2 | 2013-05-23 07:36:00 |
| 18 |       2 | 2013-05-23 07:36:10 |
| 19 |       2 | 2013-05-23 07:36:12 |
| 20 |       2 | 2013-05-23 07:36:15 |
| 21 |       1 | 2013-05-24 11:55:00 |
| 22 |       1 | 2013-05-24 11:55:02 |
+----+---------+---------------------+

期望的结果:

+---------+-------+---------------------+
| item_id | total | last_date_in_group  |
+---------+-------+---------------------+
|       1 |     4 | 2013-05-17 12:00:03 |
|       3 |     2 | 2013-05-17 14:05:01 |
|       3 |     3 | 2013-05-17 15:30:02 |
|       1 |     4 | 2013-05-18 09:14:00 |
|       2 |     3 | 2013-05-20 15:45:10 |
|       2 |     4 | 2013-05-23 07:36:15 |
|       1 |     2 | 2013-05-24 11:55:02 |
+---------+-------+---------------------+

2 个答案:

答案 0 :(得分:1)

这有点复杂。首先,您需要的是每个记录的下一个事件的时间。以下子查询在这样的时间(nexted)中添加,如果它在边界内:

 select t.*,
         (select event_date
          from t t2
          where t2.item_id = t.item_id and
                t2.event_date > t.event_date and
                <date comparison here>
          order by event_date limit 1
         ) as nexted
  from t

这使用了相关的子查询。 <date comparison here>适用于您想要的任何日期比较。没有记录时,该值将为NULL。

现在,有了这些信息(nexted),就有了获取分组的技巧。对于任何记录,它是nexted为NULL之后的第一个事件时间。这将是该系列的最后一个事件。不幸的是,这需要两级嵌套的相关子查询(或与聚合的连接)。结果看起来有点笨拙:

select item_id, GROUPING, MIN(event_date) as start_date, MAX(event_date) as end_date,
       COUNT(*) as num_dates
from (select t.*,
             (select min(t2.event_date)
              from (select t1.*,
                           (select event_date
                            from t t2
                            where t2.item_id = t1.item_id and
                                  t2.event_date > t1.event_date and
                                  <date comparison here>
                            order by event_date limit 1
                           ) as nexted
                    from t1
                   ) t2
              where t2.nexted is null
             ) as grouping
      from t
     ) s
group by item_id, grouping;

答案 1 :(得分:0)

如何通过查找每个单独记录的本地关联来接近它,然后根据每个记录的发现对最大事件日期进行分组。这是基于静态差分时间间隔(在我的示例中为5分钟)

SELECT item_id, MAX(total), MAX(last_date_in_group) AS last_date_in_group FROM (
    SELECT t1.item_id, COUNT(*) AS total, COALESCE(GREATEST(t1.event_date, MAX(t2.event_date)), t1.event_date) AS last_date_in_group
        FROM table_name t1
        LEFT JOIN table_name t2 ON t2.event_date BETWEEN t1.event_date AND t1.event_date + INTERVAL 5 MINUTE
        GROUP BY t1.id
    ) t
    GROUP BY last_date_in_group