Question

我在Redshift中遇到这个运行总和的问题（使用Postgres 8）：

select extract(month from registration_time) as month
 , extract(week from registration_time)%4+1 as week
 , extract(day from registration_time) as day
 , count(*) as count_of_users_registered
 , sum(count(*)) over (ORDER BY (1,2,3))
from loyalty.v_user
group by 1,2,3
order by 1,2,3
;

我得到的错误是：

ERROR: 42601: Aggregate window functions with an ORDER BY clause require a frame clause

Answer 1

您可以在同一查询级别上对聚合函数的结果运行窗口函数。在这个案例中使用子查询要简单得多：

SELECT *, sum(count_registered_users) OVER (ORDER BY month, week, day) AS running_sum
FROM  (
   SELECT extract(month FROM registration_time)::int     AS month
        , extract(week  FROM registration_time)::int%4+1 AS week
        , extract(day   FROM registration_time)::int     AS day
        , count(*) AS count_registered_users
   FROM   loyalty.v_user
   GROUP  BY 1, 2, 3
   ORDER  BY 1, 2, 3
   ) sub;

我还修复了表达式计算week的语法。 extract()会返回double precision，但模运算符%不接受double precision个数字。我把所有三个人都投到了integer。

与@a_horse commented类似，您不能在窗口函数的ORDER BY子句中使用位置引用（与查询的ORDER BY子句不同）。

但是，您无法在此查询中使用over (order by registration_time)，因为您按month，week，day进行分组。 registration_time既不会聚合，也不会在GROUP BY子句中聚合。在查询评估的该阶段，您无法再访问该列。

您可以重复SELECT子句中前三个ORDER BY项的表达式，以使其有效：

SELECT extract(month FROM registration_time)::int     AS month
     , extract(week  FROM registration_time)::int%4+1 AS week
     , extract(day   FROM registration_time)::int     AS day
     , count(*) AS count_registered_users
     , sum(count(*)) OVER (ORDER BY 
              extract(month FROM registration_time)::int
            , extract(week  FROM registration_time)::int%4+1
            , extract(day   FROM registration_time)::int) AS running_sum
FROM   loyalty.v_user
GROUP  BY 1, 2, 3
ORDER  BY 1, 2, 3;

但这似乎相当嘈杂。（不过，表现会很好。）

除此之外：我确实想知道week%4+1背后的目的......整个查询可能更简单。

在窗口函数中计算运行总和

1 个答案: