PostgreSQL分组滚动平均值

时间:2017-03-24 13:30:58

标签: sql postgresql

我正在尝试按项目ID列分组的设定时间段生成滚动平均值。

这是表格的基本布局和一些虚拟数据,剥离了绒毛:

----------------------------------------------------
| id   | itemid   | isup   | logged                |
----------------------------------------------------
| 1    | 1        | true   | 2017-03-23 12:55:00   |
| 2    | 1        | false  | 2017-03-23 12:57:00   |
| 3    | 1        | true   | 2017-03-23 13:07:00   |
| 4    | 1        | false  | 2017-03-23 13:09:00   |
| 5    | 1        | true   | 2017-03-23 13:50:00   |
| 6    | 2        | false  | 2017-03-23 12:55:00   |
| 7    | 2        | true   | 2017-03-23 14:00:00   |
| 8    | 2        | false  | 2017-03-23 14:03:00   |
----------------------------------------------------

我找到了answer to a previous question on rolling averages,但我似乎无法弄清楚如何按项目ID对平均值进行分组;几乎所有我失败的途径最终导致统计数字出错了。

这是我的出发点 - 我感觉我对ROW_NUMBER()OVER缺乏了解并没有帮助。

SELECT id, itemid, AVG(isup) 
    OVER (PARTITION BY groupnr ORDER BY logged) AS averagehour
FROM (       
    SELECT id, itemid, isup, logged, intervalgroup, 
        itemid - ROW_NUMBER() OVER (
            partition by intervalgroup ORDER BY logged) AS groupnr
    FROM (
        SELECT id, itemid, logged,
            CASE WHEN isup = TRUE THEN 1 ELSE 0 END AS isup,
           'epoch'::TIMESTAMP + '3600 seconds'::INTERVAL * 
                (EXTRACT(EPOCH FROM logged)::INT4 / 3600) AS intervalgroup
        FROM uplog
  ) alias_inner
) alias_outer
ORDER BY logged;

非常感谢任何帮助。

2 个答案:

答案 0 :(得分:0)

我的回答是

  1. array([[ 0.40929448, 0.47071505, 0.27701891], [ 0.59383913, 0.60611158, 0.55329837], [ 0.4393785 , 0.4276561 , 0.34999225], [ 0.4159481 , 0.4516056 , 0.3026519 ], [ 0.54449997, 0.36963636, 0.4001209 ], [ 0.36970012, 0.3145826 , 0.315974 ]]) logged,这是唯一合理的数据记录类型。

  2. 您的复杂日期算术假设在时区UTC计算timestamp with time zone的值(否则您为什么要使用logged作为基数?),舍入到下一个较低的小时

  3. 您希望按该舍入时间戳和'epoch'::timestamp进行分组。

  4. 这是一个答案:

    itemid

答案 1 :(得分:0)

链接的答案几乎包含您需要的一切。如果你想进一步“分组”(f.ex. by echo get_post_meta($post->ID, 'featured_image', true); ),你只需要将这些“组”添加到窗口函数的itemid子句中:

PARTITION BY

注意但是这个(以及链接的答案)只能起作用,因为select *, avg(isup::int) over (partition by itemid, group_nr order by logged) as rolling_avg from ( select *, id - row_number() over (partition by itemid, interval_group order by logged) as group_nr from ( select *, 'epoch'::timestamp + '3600 seconds'::interval * (extract(epoch from logged)::int4 / 3600) as interval_group from dummy ) t1 ) t2 order by itemid, logged 没有间隙&按顺序显示其表的时间戳字段。如果情况并非如此,那么您需要

id

而不是row_number() over (partition by itemid order by logged) - row_number() over (partition by itemid, interval_group order by logged) as group_nr

http://rextester.com/YBSC43615

,如果您打算仅使用每小时群组,则可以使用:

id - row_number() ...

而不是更通用的算术(因为@LaurenzAlbe已经注意到了)。