Google Big Query - 根据多个日期条件按状态计算每月总计

时间:2021-03-11 18:01:57

标签: google-bigquery

我有包含以下数据的表格:

    customer_id     subscription_id     plan      status     trial_start     trial_end      activated_at   cancelled_at

        1               jg1             basic    cancelled    2020-06-26     2020-07-14      2020-07-14     2020-09-25
        
        2               ab1             basic    cancelled    2020-08-10     2020-08-24      2020-08-24     2021-02-15

        3               cf8             basic    cancelled    2020-08-25     2020-09-04      2020-09-04     2020-10-24
                    
        4               bc2             basic     active      2020-10-12     2020-10-26      2020-10-26
                
        5               hg4             basic     active      2021-01-09     2021-02-08      2021-02-08
            
        6               cd5             basic    in-trial     2021-02-26                                

正如您从表中注意到的,status = in_trial 订阅处于试用状态。当订阅从 in_trial 转换为 active 时,有 activated_at 日期。当 in_trialactive 订阅被取消时,状态会切换到 cancelledcancelled_at 日期。 Status 列始终仅显示订阅的最新状态。对于状态的每次更改,都会不会显示新的订阅行。对于状态的每一次变化,状态都会发生变化,并且相应的日期会反映状态发生变化的时间。

我的目标是逐月计算 status = in_trial 中有多少订阅、status = active 中有多少订阅以及 status = cancelled 中有多少订阅。由于状态列反映了订阅的最新状态,因此查询必须能够根据可用日期列确定 status = in_trialstatus = activestatus = active 中有多少订阅。

如果某个特定订阅在给定月份具有多个状态(例如,subscription_id = ab1 在 2020 年 8 月处于试用状态,并在 2020 年 8 月转换为活动状态),我只希望最近的状态为考虑订阅。因此,例如,对于 subscription_id = ab1,我希望将其计为 2020 年 8 月的 active 订阅。

我正在寻找的输出是:

    date          in_trial   active    cancelled
   2020-06-01         1        0           0
   2020-07-01         0        1           0
   2020-08-01         1        2           0
   2020-09-01         0        2           1         
   2020-10-01         0        2           1 
   2020-11-01         0        2           0
   2020-12-01         0        2           0 
   2021-01-01         1        2           0
   2021-02-01         1        2           1
   2021-03-01         1        2           0

或者,只要数字正确,结果就可以以不同的格式显示。另一个输出示例可以是:

   date           status      count
2020-06-01       in_trial       1
2020-06-01        active        0
2020-06-01       cancelled      0
2020-07-01       in_trial       0
2020-07-01        active        1
2020-07-01       cancelled      0
   ...             ...         ...
2021-03-01       in_trial       1
2021-03-01        active        2
2021-03-01       cancelled      0

以下是您可以用来重现此问题中提供的示例表的查询:

SELECT 1 AS customer_id, 'jg1' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-06-26' AS trial_start, '2020-07-14' AS trial_end, '2020-07-14' AS activated_at, '2020-09-25' AS cancelled_at UNION ALL 
SELECT 2 AS customer_id, 'ab1' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-08-10' AS trial_start, '2020-08-24' AS trial_end, '2020-08-24' AS activated_at, '2021-02-15' AS cancelled_at UNION ALL 
SELECT 3 AS customer_id, 'cf8' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-08-25' AS trial_start, '2020-09-04' AS trial_end, '2020-09-04' AS activated_at, '2020-10-24' AS cancelled_at UNION ALL 
SELECT 4 AS customer_id, 'bc2' AS subscription_id, 'basic' AS plan, 'active' AS status, '2020-10-12' AS trial_start, '2020-10-26' AS trial_end, '2020-10-26' AS activated_at, '' AS cancelled_at UNION ALL 
SELECT 5 AS customer_id, 'hg4' AS subscription_id, 'basic' AS plan, 'active' AS status, '2021-01-09' AS trial_start, '2021-02-08' AS trial_end, '2021-02-08' AS activated_at, '' AS cancelled_at UNION ALL 
SELECT 6 AS customer_id, 'cd5' AS subscription_id, 'basic' AS plan, 'in_trial' AS status, '2021-02-26' AS trial_start, '' AS trial_end, '' AS activated_at, '' AS cancelled_at

我从昨天早上开始就一直在研究这个问题,并继续想办法有效地解决这个问题。预先感谢您帮助我解决此问题。

1 个答案:

答案 0 :(得分:1)

下面应该适合你

select month, 
  count(distinct if(status = 0, customer_id, null)) in_trial, 
  count(distinct if(status = 1, customer_id, null)) active, 
  count(distinct if(status = 2, customer_id, null)) canceled
from (
  select month, customer_id, 
    array_agg(status order by status desc limit 1)[offset(0)] status
  from (
    select distinct customer_id, 0 status, date_trunc(date, month) month
    from `project.dataset.table`,
    unnest(generate_date_array(date(trial_start), ifnull(date(trial_end), current_date()))) date 
      union all
    select distinct customer_id, 1 status, date_trunc(date, month) month
    from `project.dataset.table`,
    unnest(generate_date_array(date(activated_at), ifnull(date(cancelled_at), current_date()))) date 
      union all
    select distinct customer_id, 2 status, date_trunc(date(cancelled_at), month) month
    from `project.dataset.table`
)
where not month is null
group by month, customer_id
)
group by month
# order by month 

如果应用于您问题中的样本数据 - 输出为

enter image description here