Firebase通过Firebase远程配置提供拆分测试功能,但是缺乏在具有用户属性的群组部分中过滤保留的能力(实际上具有任何属性)。
为了解决这个问题,我正在寻找BigQuery,因为Firebase Analytics提供了将数据导出到此服务的可用方法。
但是我坚持了许多问题,谷歌没有答案或例子可能指出我正确的方向。
一般问题:
作为第一步,我需要聚合代表相同数据火力群组的数据,所以我可以确定我的计算是正确的:
下一步应该只是对查询应用约束,以便它们匹配自定义用户属性。
这是我到目前为止所得到的:
主要问题 - 用户计算差异很大。有时大约有100个用户,但有时接近1000个。
这是我使用的方法:
# 1
# Count users with `user_dim.first_open_timestamp_micros`
# in specified period (w0 – week 1)
# this is the way firebase group users to cohorts
# (who started app on the same day or during the same week)
# https://support.google.com/firebase/answer/6317510
SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
(
TABLE_DATE_RANGE
(
[admob-app-id-xx:xx_IOS.app_events_],
TIMESTAMP('2016-11-20'),
TIMESTAMP('2016-11-26')
)
)
WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
BETWEEN '2016-11-20' AND '2016-11-26'
# 2
# For each next period count events with
# same first_open_timestamp
# Here is example for one of the weeks.
# week 0 is Nov20-Nov26, week 1 is Nov27-Dec03
SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
(
TABLE_DATE_RANGE
(
[admob-app-id-xx:xx_IOS.app_events_],
TIMESTAMP('2016-11-27'),
TIMESTAMP('2016-12-03')
)
)
WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
BETWEEN '2016-11-20' AND '2016-11-26'
# 3
# Now we have users for each week w1, w2, ... w5
# Calculate retention for each of them
# retention week 1 = w1 / w0 * 100 = 25.72181359
# rw2 = w2 / w1 * 100
# ...
# rw5 = w5 / w1 * 100
# 4
# Shift week 0 by one and repeat from step 1
BigQuery查询提示请求
非常感谢任何有关构建复杂查询的提示和指示,这些提示可能会在一个步骤中汇总和计算此任务所需的所有数据。
Here is BigQuery Export schema if needed
附带问题:
user_dim.device_info.device_id
和user_dim.device_info.resettable_device_id
都是null
? user_dim.app_info.app_id
(如果firebase支持队友将会阅读此问题)event_dim.timestamp_micros
和event_dim.previous_timestamp_micros
,我无法达到目的。PS
来自Firebase队友的人会回答这个问题。关于通过过滤或显示大查询示例来扩展群组功能的Five month ago there are was one mention,但事情并没有发生。他们说,Firebase Analytics是他们所说的,谷歌分析已被弃用。 现在,我花了第二天精益求精,并在现有的分析工具上构建自己的解决方案。我没有,堆栈溢出不是这个评论的地方,但是你在想什么?拆分测试可能会在语法上影响我的应用的保留。我的应用程序没有出售任何东西,漏斗和事件在许多情况下都不是有价值的指标。
答案 0 :(得分:12)
非常感谢任何有关构建复杂查询的提示和指示,这些提示可能会在一个步骤中汇总和计算此任务所需的所有数据。
是的,通用bigquery可以正常工作
下面不是最通用的版本,但可以给你一个想法 在此示例中,我使用Stack Overflow Data
中提供的Google BigQuery Public Datasets首先进行子选择 - 活动 - 在大多数情况下,只需要重新编写以反映数据细节的内容。
它的作用是:
一个。定义要为分析设置的时间段
在下面的例子中 - 它是一个月 - FORMAT_DATE('%Y-%m',...
但你可以分别使用year, week, day or anything else -
•按年份 - FORMAT_DATE('%Y',DATE(answers.creation_date))AS期间
•按周 - FORMAT_DATE('%Y-%W',DATE(answers.creation_date))AS期间
•白天 - FORMAT_DATE('%Y-%m-%d',DATE(answers.creation_date))AS期间
•...
湾此外,它“仅过滤”您需要分析的事件/活动类型
例如,`WHERE CONCAT(' |',questions.tags,' |')LIKE'%| google-bigquery |%'寻找google-bigquery标记问题的答案
其余的子查询更加通用,大多数可以按原样使用
#standardSQL
WITH activities AS (
SELECT answers.owner_user_id AS id,
FORMAT_DATE('%Y-%m', DATE(answers.creation_date)) AS period
FROM `bigquery-public-data.stackoverflow.posts_answers` AS answers
JOIN `bigquery-public-data.stackoverflow.posts_questions` AS questions
ON questions.id = answers.parent_id
WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%'
GROUP BY id, period
), cohorts AS (
SELECT id, MIN(period) AS cohort FROM activities GROUP BY id
), periods AS (
SELECT period, ROW_NUMBER() OVER(ORDER BY period) AS num
FROM (SELECT DISTINCT cohort AS period FROM cohorts)
), cohorts_size AS (
SELECT cohort, periods.num AS num, COUNT(DISTINCT activities.id) AS ids
FROM cohorts JOIN activities ON activities.period = cohorts.cohort AND cohorts.id = activities.id
JOIN periods ON periods.period = cohorts.cohort
GROUP BY cohort, num
), retention AS (
SELECT cohort, activities.period AS period, periods.num AS num, COUNT(DISTINCT cohorts.id) AS ids
FROM periods JOIN activities ON activities.period = periods.period
JOIN cohorts ON cohorts.id = activities.id
GROUP BY cohort, period, num
)
SELECT
CONCAT(cohorts_size.cohort, ' - ', FORMAT("%'d", cohorts_size.ids), ' users') AS cohort,
retention.num - cohorts_size.num AS period_lag,
retention.period as period_label,
ROUND(retention.ids / cohorts_size.ids * 100, 2) AS retention , retention.ids AS rids
FROM retention
JOIN cohorts_size ON cohorts_size.cohort = retention.cohort
WHERE cohorts_size.cohort >= FORMAT_DATE('%Y-%m', DATE('2015-01-01'))
ORDER BY cohort, period_lag, period_label
您可以使用您选择的工具直观显示上述查询的结果
注意:您可以使用period_lag或period_label
请参阅以下示例中使用它们的区别
with period_lag
with period_label