我编写了一个sql查询,该查询从用户表中提取数据,并生成用户创建时的运行总计和累计总数。数据按周分组(使用postgres的窗口功能)。我使用左外连接来包括没有用户创建的周数。这是查询...
<!-- language: lang-sql -->
WITH reporting_period AS (
SELECT generate_series(date_trunc('week', date '2015-04-02'), date_trunc('week', date '2015-10-02'), interval '1 week') AS interval
)
SELECT
date(interval) AS interval
, count(users.created_at) as interval_count
, sum(count( users.created_at) ) OVER (order by date_trunc('week', users.created_at)) AS cumulative_count
FROM reporting_period
LEFT JOIN users
ON interval=date(date_trunc('week', users.created_at) )
GROUP BY interval, date_trunc('week', users.created_at) ORDER BY interval
它几乎完美无缺。在创建用户的几周内正确计算累积值。在没有用户创建的几周内,它被设置为总计,而不是累计总计。
请注意,具有** Week Tot列(interval_count)的行按预期为0,但Run Tot(cumulative_total)为1053,等于总计。
Week Week Tot Run Tot
-----------------------------------
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 1053 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 1053 **
2015-05-11 | 0 | 1053 **
2015-05-18 | 1 | 30
2015-05-25 | 0 | 1053 **
...
2015-06-08 | 996 | 1031
...
2015-09-07 | 2 | 1052
2015-09-14 | 0 | 1053 **
2015-09-21 | 1 | 1053 **
2015-09-28 | 0 | 1053 **
这就是我想要的
Week Week Tot Run Tot
-----------------------------------
2015-03-30 | 4 | 4
2015-04-06 | 13 | 17
2015-04-13 | 0 | 17 **
2015-04-20 | 9 | 26
2015-04-27 | 3 | 29
2015-05-04 | 0 | 29 **
...
在我看来,如果外部联接可以某种方式将总计应用到最后一列,那么应该可以应用当前的运行总数,但我不知道该怎么做。
这可能吗?
答案 0 :(得分:2)
由于我没有在真实表上进行测试,因此无法保证开箱即用,但此处的关键是在一系列日期内加入created_at上的用户。
with reportingperiod as (
select intervaldate as interval_begin,
intervaldate + interval '1 month' as interval_end
from (
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-03-15')),
DATE(DATE_TRUNC('day', DATE '2015-10-15')), interval '1 month') AS intervaldate
) as rp
)
select interval_end,
interval_count,
sum(interval_count) over (order by interval_end) as running_sum
from (
select interval_end,
count(u.created_at) as interval_count
from reportingperiod rp
left join (
select created_at
from users
where created_at < '2015-10-02'
) u on u.created_at > rp.interval_begin
and u.created_at <= rp.interval_end
group by interval_end
) q
答案 1 :(得分:0)
我明白了。诀窍是子查询。这是我的方法
SELECT
interval
, interval_count
, SUM(interval_count ) OVER (ORDER BY interval) AS cumulative_count
FROM
(
SELECT interval, MAX(interval_count) AS interval_count FROM
(
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('week', DATE '2015-04-02')),
DATE(DATE_TRUNC('week', DATE '2015-10-02')), interval '1 week') AS interval,
0 AS interval_count
UNION
SELECT DATE_TRUNC('week', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
GROUP BY 1 ORDER BY 1
) sub1
GROUP BY interval
) grouped_data
我不确定此方法是否存在任何严重的性能问题,但它似乎有效。如果有人有更好,更优雅或更高效的方法,我会喜欢这些反馈。
编辑:尝试按任意时间窗分组时,我的解决方案不起作用
刚试过这个解决方案并进行了以下更改
/* generate series using DATE_TRUNC('day'...)*/
SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-04-02')),
DATE(DATE_TRUNC('day', DATE '2015-10-02')), interval '1 month') AS interval,
0 AS interval_count
/* And this part */
SELECT DATE_TRUNC('day', users.created_at) AS INTERVAL,
COUNT(users.created_at) AS interval_count FROM users
WHERE users.created_at < date '2015-10-02'
GROUP BY 1 ORDER BY 1
例如,可以产生这些类似的结果,但是按间隔对数据进行分组
2015年3月15日 - 2015年4月14日,
4/15/15 - 5/14/15,
5/15/15 - 6/14/15
等