基于时间的距离的行聚合

时间:2013-08-19 20:59:09

标签: sql postgresql aggregate-functions

干杯, 在Postgres表上工作

CREATE TABLE my_table (
  "id" serial,
  "sensorid" integer,
  "actorid" integer,
  "timestamp" timestamp without time zone,
)

带示例数据

id, sensorid, actorid, timestamp
1; 2267; 3023; "2013-07-09 12:20:06.446"
2; 2267; 3023; "2013-07-09 12:20:16.421"
3; 2267; 3023; "2013-07-09 12:20:30.661"
4; 2267; 3023; "2013-07-09 12:20:36.958"
5; 2267; 3023; "2013-07-09 12:20:49.508"
6; 2267; 3023; "2013-07-09 12:20:57.683"
7; 3301; 3023; "2013-08-15 06:03:03.428"
8; 2267; 3024; "2013-07-09 12:19:52.196"
9; 2267; 3024; "2013-07-09 12:20:16.515"
10; 2267; 3024; "2013-07-09 12:20:42.341"
11; 2267; 3025; "2013-07-09 12:21:05.98"
12; 2268; 3026; "2013-07-09 12:22:35.03"
13; 2268; 3026; "2013-07-09 12:22:45.066"
14; 3192; 3026; "2013-08-09 07:41:31.206"

我想用以下条件对记录进行分组

  1. 他们有相同的传感器
  2. 他们有相同的actorid
  3. (问题:)每个人之间的时间间隔小于(比方说)5分钟。也就是说,可能会有一个超过一个小时的组,但组中两个记录之间的时间不会超过5分钟。时间跨度可以是平均值。
  4. 此外,必须提供每个组的聚合记录数,因为必须识别太大的组。
  5. 因此,输出应该类似于

    id; sensorid, actorid, avg, count
    1; 2267; 3023; "2013-07-09 12:20:30.000"; 7;
    2; 3301; 3023; "2013-08-15 06:03:03.428"; 1;
    3; 2267; 3024; "2013-07-09 12:20:06.415"; 3;
    5; 2267; 3025; "2013-07-09 12:21:05.98"; 1;
    6; 2268; 3026; "2013-07-09 12:22:40.626"; 2;
    7; 3192; 3026; "2013-08-09 07:41:31.206"; 1;
    

    感谢您的帮助! 丹尼斯

1 个答案:

答案 0 :(得分:2)

首先,您要使用lag()来确定之前的时间以及是否开始新的时间段。然后,对于每个sensorid / actorid组合,您可以执行isStart的累积总和来识别每对的组。

然后在结果中进行包含此新组的聚合:

select sensorid, actorid, min(timestamp), max(timestamp), count(*) as numInGroup
from (select t.*,
             sum(isStart) over (partition by sensorid, actorid order by timestamp) as grp
      from (select t.*,
                   (case when prevts is null or prevts < timestamp - interval '5 minutes'
                         then 1 else 0
                    end) as isStart
            from (select t.*,
                         lag(timestamp) over (partition by sensorid, actorid
                                              order by timestamp) as prevts
                  from my_table t
                 ) t
           ) t
     ) t
group by sensorid, actorid, grp