我有包含以下数据的表格:
customer_id subscription_id plan status trial_start trial_end activated_at cancelled_at
1 jg1 basic cancelled 2020-06-26 2020-07-14 2020-07-14 2020-09-25
2 ab1 basic cancelled 2020-08-10 2020-08-24 2020-08-24 2021-02-15
3 cf8 basic cancelled 2020-08-25 2020-09-04 2020-09-04 2020-10-24
4 bc2 basic active 2020-10-12 2020-10-26 2020-10-26
5 hg4 basic active 2021-01-09 2021-02-08 2021-02-08
6 cd5 basic in-trial 2021-02-26
正如您从表中注意到的,status = in_trial
订阅处于试用状态。当订阅从 in_trial
转换为 active
时,有 activated_at
日期。当 in_trial
或 active
订阅被取消时,状态会切换到 cancelled
和 cancelled_at
日期。 Status
列始终仅显示订阅的最新状态。对于状态的每次更改,都会不会显示新的订阅行。对于状态的每一次变化,状态都会发生变化,并且相应的日期会反映状态发生变化的时间。
我的目标是逐月计算 status = in_trial
中有多少订阅、status = active
中有多少订阅以及 status = cancelled
中有多少订阅。由于状态列反映了订阅的最新状态,因此查询必须能够根据可用日期列确定 status = in_trial
、status = active
和 status = active
中有多少订阅。>
如果某个特定订阅在给定月份具有多个状态(例如,subscription_id = ab1
在 2020 年 8 月处于试用状态,并在 2020 年 8 月转换为活动状态),我只希望最近的状态为考虑订阅。因此,例如,对于 subscription_id = ab1
,我希望将其计为 2020 年 8 月的 active
订阅。
我正在寻找的输出是:
date in_trial active cancelled
2020-06-01 1 0 0
2020-07-01 0 1 0
2020-08-01 1 2 0
2020-09-01 0 2 1
2020-10-01 0 2 1
2020-11-01 0 2 0
2020-12-01 0 2 0
2021-01-01 1 2 0
2021-02-01 1 2 1
2021-03-01 1 2 0
或者,只要数字正确,结果就可以以不同的格式显示。另一个输出示例可以是:
date status count
2020-06-01 in_trial 1
2020-06-01 active 0
2020-06-01 cancelled 0
2020-07-01 in_trial 0
2020-07-01 active 1
2020-07-01 cancelled 0
... ... ...
2021-03-01 in_trial 1
2021-03-01 active 2
2021-03-01 cancelled 0
以下是您可以用来重现此问题中提供的示例表的查询:
SELECT 1 AS customer_id, 'jg1' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-06-26' AS trial_start, '2020-07-14' AS trial_end, '2020-07-14' AS activated_at, '2020-09-25' AS cancelled_at UNION ALL
SELECT 2 AS customer_id, 'ab1' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-08-10' AS trial_start, '2020-08-24' AS trial_end, '2020-08-24' AS activated_at, '2021-02-15' AS cancelled_at UNION ALL
SELECT 3 AS customer_id, 'cf8' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-08-25' AS trial_start, '2020-09-04' AS trial_end, '2020-09-04' AS activated_at, '2020-10-24' AS cancelled_at UNION ALL
SELECT 4 AS customer_id, 'bc2' AS subscription_id, 'basic' AS plan, 'active' AS status, '2020-10-12' AS trial_start, '2020-10-26' AS trial_end, '2020-10-26' AS activated_at, '' AS cancelled_at UNION ALL
SELECT 5 AS customer_id, 'hg4' AS subscription_id, 'basic' AS plan, 'active' AS status, '2021-01-09' AS trial_start, '2021-02-08' AS trial_end, '2021-02-08' AS activated_at, '' AS cancelled_at UNION ALL
SELECT 6 AS customer_id, 'cd5' AS subscription_id, 'basic' AS plan, 'in_trial' AS status, '2021-02-26' AS trial_start, '' AS trial_end, '' AS activated_at, '' AS cancelled_at
我从昨天早上开始就一直在研究这个问题,并继续想办法有效地解决这个问题。预先感谢您帮助我解决此问题。
答案 0 :(得分:1)
下面应该适合你
select month,
count(distinct if(status = 0, customer_id, null)) in_trial,
count(distinct if(status = 1, customer_id, null)) active,
count(distinct if(status = 2, customer_id, null)) canceled
from (
select month, customer_id,
array_agg(status order by status desc limit 1)[offset(0)] status
from (
select distinct customer_id, 0 status, date_trunc(date, month) month
from `project.dataset.table`,
unnest(generate_date_array(date(trial_start), ifnull(date(trial_end), current_date()))) date
union all
select distinct customer_id, 1 status, date_trunc(date, month) month
from `project.dataset.table`,
unnest(generate_date_array(date(activated_at), ifnull(date(cancelled_at), current_date()))) date
union all
select distinct customer_id, 2 status, date_trunc(date(cancelled_at), month) month
from `project.dataset.table`
)
where not month is null
group by month, customer_id
)
group by month
# order by month
如果应用于您问题中的样本数据 - 输出为