7天退货/保留率

时间:2018-12-12 20:38:54

标签: postgresql

我一直在尝试计算7天退货率(也称为“经典保留率”,如此处所述:https://www.braze.com/blog/calculate-retention-rate/),然后平均30天以减少Postgresql中的噪音。

但是,我确定我做错了什么。首先,这个数字看起来比我直觉上要高(其他行业通常约为5%)。另外,我认为前7天应显示为0,因为从理论上讲,用户至少需要7天才能算作“返回”。但是,如下图所示,我得到40-70%左右。

有人介意看看下面的代码,看看是否有任何错误? 7天退货率是应用程序非常常用的指标,我没有发现使用postgresql可以在Stack Exchange(甚至是网络的其余部分)上将其计算到这种复杂程度的任何问题,所以我感觉很满意对很多人来说可能非常有用。

样本数据

Wednesday, August 1, 2018 12:00 AM    71.14
Thursday, August 2, 2018 12:00 AM     55.44
Friday, August 3, 2018 12:00 AM       50.09
Saturday, August 4, 2018 12:00 AM     45.81
Sunday, August 5, 2018 12:00 AM       43.27
Monday, August 6, 2018 12:00 AM       40.61
Tuesday, August 7, 2018 12:00 AM      39.38
Wednesday, August 8, 2018 12:00 AM    38.46
Thursday, August 9, 2018 12:00 AM     36.81
Friday, August 10, 2018 12:00 AM      35.94
with

user_first_event as (
    select distinct id, min(timestamp)::date as first_event_date
    from log
    where 
        timestamp <= current_date
        and timestamp >= {{start_date}} and timestamp <= {{end_date}}
    group by id),

event as (
    select distinct id, timestamp::date as user_event_date
    from log
    where timestamp <= current_date and timestamp >= {{start_date}}),

gap as (
    select 
        user_first_event.id, 
        user_first_event.first_event_date,
        event.user_event_date,
        event.user_event_date - user_first_event.first_event_date as days_since_signup
    from user_first_event
    join event on user_first_event.id = event.id
    where user_first_event.first_event_date <= event.user_event_date),

conversion_rate as (
select
    first_event_date,
    (sum(case when days_since_signup = 7 then 1 else 0 end) * 100.0 /
        count(distinct id)
      ) as seven_day_retention_rate
from gap
group by first_event_date
)

SELECT first_event_date,  
       AVG(seven_day_retention_rate)
            OVER(ORDER BY first_event_date ROWS BETWEEN 29 PRECEDING AND CURRENT ROW) AS rolling_avg_retention_rate
FROM conversion_rate

1 个答案:

答案 0 :(得分:0)

这个问题比您的查询看起来容易一些,您实际上可以只用一个子查询和一个外出查询来完成此操作,如下所示:

select first_event_date
 , avg(seven_day_return) as seven_day_return_day_only
 , avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
  --inner query to get value for user, 1 if they retain and 0 if they do not
  select min(timestamp)::date as first_event_date
   , case when array_agg(timestamp::date) @> ARRAY[ (min(timestamp)::date + 7) ] then 1 else 0 end as seven_day_return
  from log
  group by id ) t

group by t.first_event_date;

请注意,这会在几天中对每个加权,而不是对每个 user 加权。如果要按天对用户加权平均值,则可以使用更多的汇总和窗口来更新外部计算,以使用加权来计算值。

参考:http://sqlfiddle.com/#!17/ee17e/1/0

如果您无权访问array_agg(但具有窗口功能),则可以使用:

select first_event_date
 , avg(seven_day_return) as day_seven_day_return
 , avg( avg(seven_day_return) ) OVER(ORDER BY first_event_date asc ROWS BETWEEN 29 preceding AND CURRENT ROW ) AS thirty_day_rolling_retention
from (
  --inner query to get value for user
  select min(timestamp)::date as first_event_date
   , case when exists(select 1 from log l2 where l2.id = log.id and l2.timestamp::date = min(log.timestamp)::date + 7) then 1 else 0 end as seven_day_return
  from log
  group by id ) t

group by t.first_event_date;