我一直在尝试计算7天退货率(也称为“经典保留率”,如此处所述:https://www.braze.com/blog/calculate-retention-rate/),然后平均30天以减少Postgresql中的噪音。
但是,我确定我做错了什么。首先,这个数字看起来比我直觉上要高(其他行业通常约为5%)。另外,我认为前7天应显示为0,因为从理论上讲,用户至少需要7天才能算作“返回”。但是,如下图所示,我得到40-70%左右。
有人介意看看下面的代码,看看是否有任何错误? 7天退货率是应用程序非常常用的指标,我没有发现使用postgresql可以在Stack Exchange(甚至是网络的其余部分)上将其计算到这种复杂程度的任何问题,所以我感觉很满意对很多人来说可能非常有用。
样本数据
Wednesday, August 1, 2018 12:00 AM 71.14
Thursday, August 2, 2018 12:00 AM 55.44
Friday, August 3, 2018 12:00 AM 50.09
Saturday, August 4, 2018 12:00 AM 45.81
Sunday, August 5, 2018 12:00 AM 43.27
Monday, August 6, 2018 12:00 AM 40.61
Tuesday, August 7, 2018 12:00 AM 39.38
Wednesday, August 8, 2018 12:00 AM 38.46
Thursday, August 9, 2018 12:00 AM 36.81
Friday, August 10, 2018 12:00 AM 35.94
with
user_first_event as (
select distinct id, min(timestamp)::date as first_event_date
from log
where
timestamp <= current_date
and timestamp >= {{start_date}} and timestamp <= {{end_date}}
group by id),
event as (
select distinct id, timestamp::date as user_event_date
from log
where timestamp <= current_date and timestamp >= {{start_date}}),
gap as (
select
user_first_event.id,
user_first_event.first_event_date,
event.user_event_date,
event.user_event_date - user_first_event.first_event_date as days_since_signup
from user_first_event
join event on user_first_event.id = event.id
where user_first_event.first_event_date <= event.user_event_date),
conversion_rate as (
select
first_event_date,
(sum(case when days_since_signup = 7 then 1 else 0 end) * 100.0 /
count(distinct id)
) as seven_day_retention_rate
from gap
group by first_event_date
)
SELECT first_event_date,
AVG(seven_day_retention_rate)
OVER(ORDER BY first_event_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_avg_retention_rate
FROM conversion_rate
答案 0 :(得分:0)
这个问题比您的查询看起来容易一些,您实际上可以只用一个子查询和一个外出查询来完成此操作,如下所示:
select first_event_date
, avg(seven_day_return) as seven_day_return_day_only
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user, 1 if they retain and 0 if they do not
select min(timestamp)::date as first_event_date
, case when array_agg(timestamp::date) @> ARRAY[ (min(timestamp)::date + 7) ] then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;
请注意,这会在几天中对每个天加权,而不是对每个 user 加权。如果要按天对用户加权平均值,则可以使用更多的汇总和窗口来更新外部计算,以使用加权来计算值。
参考:http://sqlfiddle.com/#!17/ee17e/1/0
如果您无权访问array_agg(但具有窗口功能),则可以使用:
select first_event_date
, avg(seven_day_return) as day_seven_day_return
, avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
--inner query to get value for user
select min(timestamp)::date as first_event_date
, case when exists(select 1 from log l2 where l2.id = log.id and l2.timestamp::date = min(log.timestamp)::date + 7) then 1 else 0 end as seven_day_return
from log
group by id ) t
group by t.first_event_date;