Postgresql运行总计,组中缺少数据和外连接

时间:2016-07-14 16:00:28

标签: postgresql

我编写了一个sql查询,该查询从用户表中提取数据,并生成用户创建时的运行总计和累计总数。数据按周分组(使用postgres的窗口功能)。我使用左外连接来包括没有用户创建的周数。这是查询...

<!-- language: lang-sql -->

WITH reporting_period AS (
   SELECT generate_series(date_trunc('week', date '2015-04-02'), date_trunc('week', date '2015-10-02'), interval '1 week') AS interval
)

SELECT 
  date(interval) AS interval
, count(users.created_at) as interval_count 
, sum(count( users.created_at) ) OVER (order by date_trunc('week', users.created_at)) AS cumulative_count 

 FROM reporting_period 
 LEFT JOIN users 
 ON interval=date(date_trunc('week', users.created_at) )

GROUP BY interval, date_trunc('week', users.created_at) ORDER BY interval

它几乎完美无缺。在创建用户的几周内正确计算累积值。在没有用户创建的几周内,它被设置为总计,而不是累计总计。

请注意,具有** Week Tot列(interval_count)的行按预期为0,但Run Tot(cumulative_total)为1053,等于总计。

    Week          Week Tot   Run Tot
-----------------------------------
2015-03-30        | 4        | 4
2015-04-06        | 13       | 17
2015-04-13        | 0        | 1053 **
2015-04-20        | 9        | 26
2015-04-27        | 3        | 29
2015-05-04        | 0        | 1053 **
2015-05-11        | 0        | 1053 **
2015-05-18        | 1        | 30
2015-05-25        | 0        | 1053 **
...
2015-06-08        | 996      | 1031
...
2015-09-07        | 2        | 1052
2015-09-14        | 0        | 1053 **
2015-09-21        | 1        | 1053 **
2015-09-28        | 0        | 1053 **

这就是我想要的

 Week              Week Tot   Run Tot
-----------------------------------
2015-03-30        | 4        | 4
2015-04-06        | 13       | 17
2015-04-13        | 0        | 17 **
2015-04-20        | 9        | 26
2015-04-27        | 3        | 29
2015-05-04        | 0        | 29 **
...

在我看来,如果外部联接可以某种方式将总计应用到最后一列,那么应该可以应用当前的运行总数,但我不知道该怎么做。

这可能吗?

2 个答案:

答案 0 :(得分:2)

由于我没有在真实表上进行测试,因此无法保证开箱即用,但此处的关键是在一系列日期内加入created_at上的用户。

    with reportingperiod as (
    select intervaldate as interval_begin,
        intervaldate + interval '1 month' as interval_end
    from (
        SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-03-15')),
        DATE(DATE_TRUNC('day', DATE '2015-10-15')), interval '1 month') AS intervaldate
    ) as rp 
)

select interval_end, 
    interval_count,
    sum(interval_count) over (order by interval_end) as running_sum
from (
    select interval_end, 
        count(u.created_at) as interval_count
    from reportingperiod rp
    left join ( 
        select created_at
        from users 
        where created_at < '2015-10-02' 
    ) u on u.created_at > rp.interval_begin 
        and u.created_at <= rp.interval_end
    group by interval_end
) q

答案 1 :(得分:0)

我明白了。诀窍是子查询。这是我的方法

  1. 将count列添加到generate_series调用,默认值为0
  2. 从用户数据中选择间隔和计数(users.created_at)
  3. 联合generate_series和步骤#2中select的结果 (此时结果将为每个间隔重复)
  4. 使用子查询中的结果获取间隔和max(interval_count)以消除重复
  5. 像以前一样使用窗口聚合来获取运行总计
  6. SELECT
    interval
    , interval_count 
    , SUM(interval_count ) OVER (ORDER BY interval) AS cumulative_count 
    
    FROM
     (
      SELECT interval, MAX(interval_count) AS interval_count FROM
      (
       SELECT GENERATE_SERIES(DATE(DATE_TRUNC('week', DATE '2015-04-02')),
       DATE(DATE_TRUNC('week', DATE '2015-10-02')), interval '1 week') AS interval,
       0 AS interval_count
    
       UNION 
    
       SELECT DATE_TRUNC('week', users.created_at) AS INTERVAL,
       COUNT(users.created_at) AS interval_count FROM users 
    
      WHERE users.created_at < date '2015-10-02'
      GROUP BY 1 ORDER BY 1 
     ) sub1
    
     GROUP BY interval
     ) grouped_data
    

    我不确定此方法是否存在任何严重的性能问题,但它似乎有效。如果有人有更好,更优雅或更高效的方法,我会喜欢这些反馈。

    编辑:尝试按任意时间窗分组时,我的解决方案不起作用
    刚试过这个解决方案并进行了以下更改

    /* generate series using DATE_TRUNC('day'...)*/
    
    SELECT GENERATE_SERIES(DATE(DATE_TRUNC('day', DATE '2015-04-02')),
       DATE(DATE_TRUNC('day', DATE '2015-10-02')), interval '1 month') AS interval,
       0 AS interval_count
    
    /* And this part */
    SELECT DATE_TRUNC('day', users.created_at) AS INTERVAL,
       COUNT(users.created_at) AS interval_count FROM users 
    
      WHERE users.created_at < date '2015-10-02'
      GROUP BY 1 ORDER BY 1 
    

    例如,可以产生这些类似的结果,但是按间隔对数据进行分组 2015年3月15日 - 2015年4月14日,
    4/15/15 - 5/14/15,
    5/15/15 - 6/14/15