使用BigQuery和Standard SQL,我试图计算一个时期内看到的用户与之后时期内看到的用户的保留率。我想每天计算一次,使用相同的周期偏移量,随着时间的流逝,该偏移量会滑动。
使用的数据是Google Analytics(分析)数据,其中包括以下字段: https://support.google.com/analytics/answer/3437719?hl=en
我有查询来计算保留时间,并且我知道如何将其设置为每天运行,但是我想创建一些历史记录作为开始,然后我需要模拟该查询已经每天运行了一段时间时间。
因此,我的问题是,假设第一个时期为60-31天前,接下来的时间段是30-1天前,都是相对于每天。
我具有的查询
WITH
users_seen_on_start AS (
SELECT DISTINCT
fullVisitorId AS users
FROM `project.view.ga_sessions_*`, UNNEST(hits) as hits
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY))
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 31 DAY))
),
num_users AS (
SELECT
count(*) AS num_users_in_cohort
FROM users_seen_on_start
),
engaged_user_by_day AS (
SELECT
COUNT (DISTINCT fullVisitorId) as num_engaged_users
FROM `project.view.ga_sessions_*`, UNNEST(hits) as hits INNER JOIN users_seen_on_start ON users = fullVisitorId
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
)
SELECT
num_engaged_users,
num_users_in_cohort ,
ROUND((num_engaged_users / num_users_in_cohort), 3) as retention_rate
FROM engaged_user_by_day CROSS JOIN num_users
查询输出:
num_engaged_users num_users_in_cohort retention_rate
100871 130632 0.772
所需的输出:
date num_engaged_users num_users_in_cohort retention_rate
20190101 100871 130632 0.772
20190102 102356 128044 0.799
样本数据: users_seen_on_start:
CREATE TABLE users_seen_on_start(
users INTEGER NOT NULL PRIMARY KEY
);
INSERT INTO users_seen_on_start(users) VALUES (6854940999573646134);
INSERT INTO users_seen_on_start(users) VALUES (9215697890064860396);
INSERT INTO users_seen_on_start(users) VALUES (5595285367064974856);
INSERT INTO users_seen_on_start(users) VALUES (2054889847396937366);
INSERT INTO users_seen_on_start(users) VALUES (2159837518531156200);
INSERT INTO users_seen_on_start(users) VALUES (2297077047785095499);
INSERT INTO users_seen_on_start(users) VALUES (15934479773952228986);
INSERT INTO users_seen_on_start(users) VALUES (18388188973174323198);
INSERT INTO users_seen_on_start(users) VALUES (13527051077114159514);
INSERT INTO users_seen_on_start(users) VALUES (10527965347657532651);
INSERT INTO users_seen_on_start(users) VALUES (10056509199904199853);
INSERT INTO users_seen_on_start(users) VALUES (721447367663373337);
INSERT INTO users_seen_on_start(users) VALUES (7418392997259835212);
INSERT INTO users_seen_on_start(users) VALUES (1739158781654194388);
INSERT INTO users_seen_on_start(users) VALUES (13485010633919602577);
INSERT INTO users_seen_on_start(users) VALUES (11647513515368913077);
INSERT INTO users_seen_on_start(users) VALUES (14723171573825482124);
INSERT INTO users_seen_on_start(users) VALUES (316809625899342248);
INSERT INTO users_seen_on_start(users) VALUES (736697877724685769);
INSERT INTO users_seen_on_start(users) VALUES (1069762618672583190);
INSERT INTO users_seen_on_start(users) VALUES (6216571959193109764);
INSERT INTO users_seen_on_start(users) VALUES (8276320148745358024);
INSERT INTO users_seen_on_start(users) VALUES (4390033140354437765);
INSERT INTO users_seen_on_start(users) VALUES (4691956767605638049);
INSERT INTO users_seen_on_start(users) VALUES (8853050929187030210);
INSERT INTO users_seen_on_start(users) VALUES (4866380534293592106);
INSERT INTO users_seen_on_start(users) VALUES (9336123194114580988);
INSERT INTO users_seen_on_start(users) VALUES (9102157575556710064);
INSERT INTO users_seen_on_start(users) VALUES (5656668438554927436);
INSERT INTO users_seen_on_start(users) VALUES (1488391481235428518);
INSERT INTO users_seen_on_start(users) VALUES (2840931994944989396);
INSERT INTO users_seen_on_start(users) VALUES (2881922818148829205);
INSERT INTO users_seen_on_start(users) VALUES (15266979732129081227);
INSERT INTO users_seen_on_start(users) VALUES (17452034639473427980);
INSERT INTO users_seen_on_start(users) VALUES (16885946609916150102);
INSERT INTO users_seen_on_start(users) VALUES (11414196691107747488);
INSERT INTO users_seen_on_start(users) VALUES (11428367061145620067);
INSERT INTO users_seen_on_start(users) VALUES (11589939716097939663);
INSERT INTO users_seen_on_start(users) VALUES (9471966568512958356);
INSERT INTO users_seen_on_start(users) VALUES (10302973548806993195);
INSERT INTO users_seen_on_start(users) VALUES (11655856328192191298);
INSERT INTO users_seen_on_start(users) VALUES (13935768668138194306);
INSERT INTO users_seen_on_start(users) VALUES (12094331062677811830);
INSERT INTO users_seen_on_start(users) VALUES (10077917656361210181);
INSERT INTO users_seen_on_start(users) VALUES (12524832796889539656);
INSERT INTO users_seen_on_start(users) VALUES (12545063140779927439);
INSERT INTO users_seen_on_start(users) VALUES (12842029924433102779);
INSERT INTO users_seen_on_start(users) VALUES (642899976804427792);
INSERT INTO users_seen_on_start(users) VALUES (6445850403127955479);
INSERT INTO users_seen_on_start(users) VALUES (6564816699382875533);
INSERT INTO users_seen_on_start(users) VALUES (4991902095596735494);
INSERT INTO users_seen_on_start(users) VALUES (9240481697386039624);
INSERT INTO users_seen_on_start(users) VALUES (7462479064488338261);
INSERT INTO users_seen_on_start(users) VALUES (7954751513116206324);
INSERT INTO users_seen_on_start(users) VALUES (7916442053878140133);
INSERT INTO users_seen_on_start(users) VALUES (5943783673017806941);
INSERT INTO users_seen_on_start(users) VALUES (8019839094524452470);
INSERT INTO users_seen_on_start(users) VALUES (1325025305488677572);
INSERT INTO users_seen_on_start(users) VALUES (2757934917480873578);
INSERT INTO users_seen_on_start(users) VALUES (2953784252203011629);
INSERT INTO users_seen_on_start(users) VALUES (15663806630564163334);
INSERT INTO users_seen_on_start(users) VALUES (15822234287772625947);
INSERT INTO users_seen_on_start(users) VALUES (16086946171320332009);
INSERT INTO users_seen_on_start(users) VALUES (18326627563023885086);
INSERT INTO users_seen_on_start(users) VALUES (10177146105583960910);
INSERT INTO users_seen_on_start(users) VALUES (11536897925313534298);
INSERT INTO users_seen_on_start(users) VALUES (9521017502744452252);
INSERT INTO users_seen_on_start(users) VALUES (13846940074652198876);
INSERT INTO users_seen_on_start(users) VALUES (12072437921316824606);
INSERT INTO users_seen_on_start(users) VALUES (12130911952369094625);
INSERT INTO users_seen_on_start(users) VALUES (9944467373343765193);
INSERT INTO users_seen_on_start(users) VALUES (10130381296105130198);
INSERT INTO users_seen_on_start(users) VALUES (14503197259750222085);
INSERT INTO users_seen_on_start(users) VALUES (14514330935424697592);
INSERT INTO users_seen_on_start(users) VALUES (14700594671656063689);
INSERT INTO users_seen_on_start(users) VALUES (14735848995356133105);
INSERT INTO users_seen_on_start(users) VALUES (14880164794899465972);
INSERT INTO users_seen_on_start(users) VALUES (12844072088150040888);
INSERT INTO users_seen_on_start(users) VALUES (13244940156815171079);
INSERT INTO users_seen_on_start(users) VALUES (260647987138229776);
INSERT INTO users_seen_on_start(users) VALUES (4223941834652936903);
INSERT INTO users_seen_on_start(users) VALUES (7094513577121476923);
INSERT INTO users_seen_on_start(users) VALUES (9277783379267650134);
INSERT INTO users_seen_on_start(users) VALUES (5996874826262341487);
INSERT INTO users_seen_on_start(users) VALUES (6070125918500272373);
INSERT INTO users_seen_on_start(users) VALUES (1530161613058767114);
INSERT INTO users_seen_on_start(users) VALUES (3564084216977083409);
INSERT INTO users_seen_on_start(users) VALUES (2096791516261012274);
INSERT INTO users_seen_on_start(users) VALUES (17168509252900308876);
INSERT INTO users_seen_on_start(users) VALUES (17616873481220648376);
INSERT INTO users_seen_on_start(users) VALUES (17998058763193684336);
INSERT INTO users_seen_on_start(users) VALUES (16355698068697852664);
INSERT INTO users_seen_on_start(users) VALUES (18429432702234588790);
INSERT INTO users_seen_on_start(users) VALUES (11376613349708048591);
INSERT INTO users_seen_on_start(users) VALUES (11409024415220391296);
INSERT INTO users_seen_on_start(users) VALUES (11497048558563286896);
INSERT INTO users_seen_on_start(users) VALUES (11461240236069178124);
INSERT INTO users_seen_on_start(users) VALUES (10315048118076394592);
INSERT INTO users_seen_on_start(users) VALUES (10534194857330671443);
INSERT INTO users_seen_on_start(users) VALUES (13087206783728054302);
engaged_user_by_day的样本数据:
CREATE TABLE engaged_user_by_day(
users INTEGER NOT NULL PRIMARY KEY
);
INSERT INTO engaged_user_by_day(users) VALUES (6854940999573646134);
INSERT INTO engaged_user_by_day(users) VALUES (9215697890064860396);
INSERT INTO engaged_user_by_day(users) VALUES (5595285367064974856);
INSERT INTO engaged_user_by_day(users) VALUES (2054889847396937366);
INSERT INTO engaged_user_by_day(users) VALUES (2159837518531156200);
INSERT INTO engaged_user_by_day(users) VALUES (2297077047785095499);
INSERT INTO engaged_user_by_day(users) VALUES (15934479773952228986);
INSERT INTO engaged_user_by_day(users) VALUES (18388188973174323198);
INSERT INTO engaged_user_by_day(users) VALUES (13527051077114159514);
INSERT INTO engaged_user_by_day(users) VALUES (10527965347657532651);
INSERT INTO engaged_user_by_day(users) VALUES (10056509199904199853);
INSERT INTO engaged_user_by_day(users) VALUES (721447367663373337);
INSERT INTO engaged_user_by_day(users) VALUES (7418392997259835212);
INSERT INTO engaged_user_by_day(users) VALUES (1739158781654194388);
INSERT INTO engaged_user_by_day(users) VALUES (13485010633919602577);
INSERT INTO engaged_user_by_day(users) VALUES (11647513515368913077);
INSERT INTO engaged_user_by_day(users) VALUES (14723171573825482124);
INSERT INTO engaged_user_by_day(users) VALUES (316809625899342248);
INSERT INTO engaged_user_by_day(users) VALUES (736697877724685769);
INSERT INTO engaged_user_by_day(users) VALUES (1069762618672583190);
INSERT INTO engaged_user_by_day(users) VALUES (6216571959193109764);
INSERT INTO engaged_user_by_day(users) VALUES (8276320148745358024);
INSERT INTO engaged_user_by_day(users) VALUES (4390033140354437765);
INSERT INTO engaged_user_by_day(users) VALUES (4691956767605638049);
INSERT INTO engaged_user_by_day(users) VALUES (8853050929187030210);
INSERT INTO engaged_user_by_day(users) VALUES (4866380534293592106);
INSERT INTO engaged_user_by_day(users) VALUES (9336123194114580988);
INSERT INTO engaged_user_by_day(users) VALUES (9102157575556710064);
INSERT INTO engaged_user_by_day(users) VALUES (5656668438554927436);
INSERT INTO engaged_user_by_day(users) VALUES (1488391481235428518);
INSERT INTO engaged_user_by_day(users) VALUES (2840931994944989396);
INSERT INTO engaged_user_by_day(users) VALUES (2881922818148829205);
INSERT INTO engaged_user_by_day(users) VALUES (15266979732129081227);
INSERT INTO engaged_user_by_day(users) VALUES (17452034639473427980);
INSERT INTO engaged_user_by_day(users) VALUES (16885946609916150102);
INSERT INTO engaged_user_by_day(users) VALUES (11414196691107747488);
INSERT INTO engaged_user_by_day(users) VALUES (11428367061145620067);
INSERT INTO engaged_user_by_day(users) VALUES (11589939716097939663);
INSERT INTO engaged_user_by_day(users) VALUES (9471966568512958356);
INSERT INTO engaged_user_by_day(users) VALUES (10302973548806993195);
INSERT INTO engaged_user_by_day(users) VALUES (11655856328192191298);
INSERT INTO engaged_user_by_day(users) VALUES (13935768668138194306);
INSERT INTO engaged_user_by_day(users) VALUES (12094331062677811830);
INSERT INTO engaged_user_by_day(users) VALUES (10077917656361210181);
INSERT INTO engaged_user_by_day(users) VALUES (12524832796889539656);
INSERT INTO engaged_user_by_day(users) VALUES (12545063140779927439);
INSERT INTO engaged_user_by_day(users) VALUES (12842029924433102779);
INSERT INTO engaged_user_by_day(users) VALUES (642899976804427792);
INSERT INTO engaged_user_by_day(users) VALUES (6445850403127955479);
INSERT INTO engaged_user_by_day(users) VALUES (6564816699382875533);
INSERT INTO engaged_user_by_day(users) VALUES (4991902095596735494);
INSERT INTO engaged_user_by_day(users) VALUES (9240481697386039624);
INSERT INTO engaged_user_by_day(users) VALUES (7462479064488338261);
答案 0 :(得分:0)
SQL中的滚动数据可以使用自联接来实现。 请在下面查看此示例,该示例查找过去30天内相对于每天的数据:
from atable a1
left join atable a2 on a1.custid=a2.custid
and datediff(day,a1.dt,a2.dt)<=30 and datediff(day,a1.dt,a2.dt)>0
答案 1 :(得分:0)
有些链接可以解决一个时间窗口:
BigQuery LAG function。
Retention,使用BigQuery和 简单数据模型
但是,用例听起来更像Custom Retention Cohorts Using BigQuery and Google Analytics。即使该链接描述了Firebase的用例,此处说明的查询也可以解决您的主要查询。例如,以下查询可解决每三天的保留期:
WITH analytics_data AS (
SELECT user_pseudo_id, event_timestamp, event_name, device.mobile_model_name,
UNIX_MICROS(TIMESTAMP("2018-08-01 00:00:00", "-7:00")) AS start_day,
3600*1000*1000*24*3 AS one_week_micros -- Note the *3 at the end here
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE _table_suffix BETWEEN '20180731' AND '20180829'
)