BigQuery:如何在2.5年内对用户群进行同类群组/保留率分析?

时间:2019-06-25 22:42:44

标签: mysql sql google-bigquery

系统要求我根据2.5年来的购买情况,针对5万名以上的应用程序用户完成分配的队列/保留分析。

我有两个表可以参考。一张表格显示了user_id和帐户注册日期(采用TIMESTAMP格式)。另一张表显示了用户在应用程序上进行的购买,每次购买都与TIMESTAMP相关联。

我该如何精确地编写查询来进行这种分析?我已经在StackOverflow,Reddit和其他论坛上查看了一些示例,其中许多示例仅用于一个月或几周的用户注册,而几年后的两年中,每个月都有新用户注册。

表1

-时间戳时间戳

-user_id字节

表2

-user_id字节

-account_signup TIMESTAMP

2 个答案:

答案 0 :(得分:2)

-提取相关列-

 With step_1 as (
    Select
    customer_email as user_id,
    date(created_at) as order_date,
    FROM `ordertable` order by 1,2 asc
    ),

—获取日期–

step_2 as(
Select
user_id,
order_date,
CASE user_id
WHEN LAG(user_id) OVER (order by 1,2 asc) THEN First_value(order_date) OVER (partition by user_id order by 1,2 asc)
ELSE order_date
END
as acquisition_date
from step_1
),

-定义订单日期和获取日期之间的同类群组和天数-

step_3 as (
Select
user_id,
order_date,
acquisition_date,
date_diff(order_date,acquisition_date,DAY) as repeat_after_days,
concat(cast(format_date(“%E4Y”, cast(acquisition_date as date)) as string),’-’,cast(format_date(“%m”, cast(acquisition_date as date)) as string)) as cohort
from step_2
),

-绝对订单数同类群组表(我们可以在此处停止以获取每个同类群组的重复订单数)–

repeat_cohort_numbers as (
select
cohort,
count(distinct user_id) as users,
countif(repeat_after_days <= 30)-count(distinct user_id) as repeat_m0,
countif(repeat_after_days<= 60 AND repeat_after_days > 30) as repeat_m1,
countif(repeat_after_days<= 90 AND repeat_after_days > 60) as repeat_m2,
countif(repeat_after_days<= 120 AND repeat_after_days > 90) as repeat_m3,
countif(repeat_after_days<= 150 AND repeat_after_days > 120) as repeat_m4,
countif(repeat_after_days<= 180 AND repeat_after_days > 150) as repeat_m5,
countif(repeat_after_days<= 210 AND repeat_after_days > 180) as repeat_m6,
countif(repeat_after_days<= 240 AND repeat_after_days > 210) as repeat_m7,
countif(repeat_after_days<= 270 AND repeat_after_days > 240) as repeat_m8,
countif(repeat_after_days<= 300 AND repeat_after_days > 270) as repeat_m9,
countif(repeat_after_days<= 330 AND repeat_after_days > 300) as repeat_m10,
countif(repeat_after_days<= 360 AND repeat_after_days > 330) as repeat_m11,
countif(repeat_after_days<= 390 AND repeat_after_days > 360) as repeat_m12,
countif(repeat_after_days<= 420 AND repeat_after_days > 390) as repeat_m13,
countif(repeat_after_days<= 450 AND repeat_after_days > 420) as repeat_m14,
countif(repeat_after_days<= 480 AND repeat_after_days > 450) as repeat_m15,
countif(repeat_after_days<= 510 AND repeat_after_days > 480) as repeat_m16,
countif(repeat_after_days<= 540 AND repeat_after_days > 510) as repeat_m17,
countif(repeat_after_days<= 570 AND repeat_after_days > 540) as repeat_m18,
countif(repeat_after_days<= 600 AND repeat_after_days > 570) as repeat_m19,
countif(repeat_after_days<= 630 AND repeat_after_days > 600) as repeat_m20,
countif(repeat_after_days<= 660 AND repeat_after_days > 630) as repeat_m21,
countif(repeat_after_days<= 690 AND repeat_after_days > 660) as repeat_m22,
countif(repeat_after_days<= 720 AND repeat_after_days > 690) as repeat_m23,
countif(repeat_after_days<= 750 AND repeat_after_days > 720) as repeat_m24,
countif(repeat_after_days<= 780 AND repeat_after_days > 750) as repeat_m25,
countif(repeat_after_days<= 810 AND repeat_after_days > 780) as repeat_m26,
countif(repeat_after_days<= 840 AND repeat_after_days > 810) as repeat_m27,
countif(repeat_after_days<= 870 AND repeat_after_days > 840) as repeat_m28,
countif(repeat_after_days<= 900 AND repeat_after_days > 870) as repeat_m29,
countif(repeat_after_days<= 930 AND repeat_after_days > 900) as repeat_m30
from step_3
group by cohort order by cohort asc
)
/*
— Cohort behaviour (by percentage) table —
select
cohort,
users,
repeat_m0/users as m0_order_percent,
repeat_m1/users as m1_order_percent,
repeat_m2/users as m2_order_percent,
repeat_m3/users as m3_order_percent,
repeat_m4/users as m4_order_percent,
repeat_m5/users as m5_order_percent,
repeat_m6/users as m6_order_percent,
repeat_m7/users as m7_order_percent,
repeat_m8/users as m8_order_percent,
repeat_m9/users as m9_order_percent,
repeat_m10/users as m10_order_percent,
repeat_m11/users as m11_order_percent,
repeat_m12/users as m12_order_percent,
repeat_m13/users as m13_order_percent,
repeat_m14/users as m14_order_percent,
repeat_m15/users as m15_order_percent,
repeat_m16/users as m16_order_percent,
repeat_m17/users as m17_order_percent,
repeat_m18/users as m18_order_percent,
repeat_m19/users as m19_order_percent,
repeat_m20/users as m20_order_percent,
repeat_m21/users as m21_order_percent,
repeat_m22/users as m22_order_percent,
repeat_m23/users as m23_order_percent,
repeat_m24/users as m24_order_percent,
repeat_m25/users as m25_order_percent,
repeat_m26/users as m26_order_percent,
repeat_m27/users as m27_order_percent,
repeat_m28/users as m28_order_percent,
repeat_m29/users as m29_order_percent,
repeat_m30/users as m30_order_percent
from repeat_cohort_numbers
*/

—最终队列汇总表—

select
sum(users) as net_users,
sum(repeat_m0)/SUM(IF (repeat_m0>0,users,NULL)) as m0_repeat,
sum(repeat_m1)/SUM(IF (repeat_m1>0,users,NULL)) as m1_repeat,
sum(repeat_m2)/SUM(IF (repeat_m2>0,users,NULL)) as m2_repeat,
sum(repeat_m3)/SUM(IF (repeat_m3>0,users,NULL)) as m3_repeat,
sum(repeat_m4)/SUM(IF (repeat_m4>0,users,NULL)) as m4_repeat,
sum(repeat_m5)/SUM(IF (repeat_m5>0,users,NULL)) as m5_repeat,
sum(repeat_m6)/SUM(IF (repeat_m6>0,users,NULL)) as m6_repeat,
sum(repeat_m7)/SUM(IF (repeat_m7>0,users,NULL)) as m7_repeat,
sum(repeat_m8)/SUM(IF (repeat_m8>0,users,NULL)) as m8_repeat,
sum(repeat_m9)/SUM(IF (repeat_m9>0,users,NULL)) as m9_repeat,
sum(repeat_m10)/SUM(IF (repeat_m10>0,users,NULL)) as m10_repeat,
sum(repeat_m11)/SUM(IF (repeat_m11>0,users,NULL)) as m11_repeat,
sum(repeat_m12)/SUM(IF (repeat_m12>0,users,NULL)) as m12_repeat,
sum(repeat_m13)/SUM(IF (repeat_m13>0,users,NULL)) as m13_repeat,
sum(repeat_m14)/SUM(IF (repeat_m14>0,users,NULL)) as m14_repeat,
sum(repeat_m15)/SUM(IF (repeat_m15>0,users,NULL)) as m15_repeat,
sum(repeat_m16)/SUM(IF (repeat_m16>0,users,NULL)) as m16_repeat,
sum(repeat_m17)/SUM(IF (repeat_m17>0,users,NULL)) as m17_repeat,
sum(repeat_m18)/SUM(IF (repeat_m18>0,users,NULL)) as m18_repeat,
sum(repeat_m19)/SUM(IF (repeat_m19>0,users,NULL)) as m19_repeat,
sum(repeat_m20)/SUM(IF (repeat_m20>0,users,NULL)) as m20_repeat,
sum(repeat_m21)/SUM(IF (repeat_m21>0,users,NULL)) as m21_repeat,
sum(repeat_m22)/SUM(IF (repeat_m22>0,users,NULL)) as m22_repeat,
sum(repeat_m23)/SUM(IF (repeat_m23>0,users,NULL)) as m23_repeat,
sum(repeat_m24)/SUM(IF (repeat_m24>0,users,NULL)) as m24_repeat,
sum(repeat_m25)/SUM(IF (repeat_m25>0,users,NULL)) as m25_repeat,
sum(repeat_m26)/SUM(IF (repeat_m26>0,users,NULL)) as m26_repeat,
sum(repeat_m27)/SUM(IF (repeat_m27>0,users,NULL)) as m27_repeat,
sum(repeat_m28)/SUM(IF (repeat_m28>0,users,NULL)) as m28_repeat,
sum(repeat_m29)/SUM(IF (repeat_m29>0,users,NULL)) as m29_repeat,
sum(repeat_m30)/SUM(IF (repeat_m30>0,users,NULL)) as m30_repeat
from repeat_cohort_numbers

有关详细信息,请检查:https://medium.com/@devamsaxena/creating-customer-retention-cohorts-on-big-query-b521b0e4db1f

答案 1 :(得分:0)

以下查询按月显示了30天的保留期,它回答了以下问题:“对于一个月内注册的所有用户,在帐户注册后30天有多少人进行了购买?”

with user_signups as (
  select user_id, cast(account_signup as date) as account_signup_date from <signup_table>
),
most_recent_purchase as (
  select user_id, max(cast(timestamp as date)) as most_recent_purchase_date from <purchase_table> group by 1
),
joined as (
  select 
    user_id, 
    account_signup_date, 
    most_recent_purchase_date, 
    date_diff(most_recent_purchase_date,account_signup_date,DAY) as retained_days
  from user_signups
  inner join most_recent_purchase using(user_id)
),
prep as (
  select 
    format_date('%Y-%m', account_signup_date) as signupYYYYMM
    count(*) as users,
    sum(case when retained_days >= 30 then 1 else 0 end) as retained_users_30
  from joined
  group by 1
)
select
  signupYYYYMM,
  users,
  retained_users_30/users as retention_30_days
from prep
order by 1

希望您能看到如何修改此方法以进行每周或每年的队列研究以及不同的保留期。

一些警告...此查询假定所有创建帐户的用户都进行了购买。如果某些帐户用户未进行购买,那么您将希望退出,并且可能需要调整保留计算/定义(取决于业务目的)。