Firebase导出到BigQuery:保留群组查询

时间:2017-01-06 16:01:53

标签: firebase google-bigquery firebase-analytics

Firebase通过Firebase远程配置提供拆分测试功能,但是缺乏在具有用户属性的群组部分中过滤保留的能力(实际上具有任何属性)。

为了解决这个问题,我正在寻找BigQuery,因为Firebase Analytics提供了将数据导出到此服务的可用方法。

但是我坚持了许多问题,谷歌没有答案或例子可能指出我正确的方向。

一般问题:

作为第一步,我需要聚合代表相同数据火力群组的数据,所以我可以确定我的计算是正确的:

firebase cohorts

下一步应该只是对查询应用约束,以便它们匹配自定义用户属性。

这是我到目前为止所得到的:

enter image description here

主要问题 - 用户计算差异很大。有时大约有100个用户,但有时接近1000个。

这是我使用的方法:

# 1

# Count users with `user_dim.first_open_timestamp_micros` 
# in specified period (w0 – week 1)
# this is the way firebase group users to cohorts 
# (who started app on the same day or during the same week) 
# https://support.google.com/firebase/answer/6317510

SELECT
  COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
  (
   TABLE_DATE_RANGE
    (
     [admob-app-id-xx:xx_IOS.app_events_], 
     TIMESTAMP('2016-11-20'), 
     TIMESTAMP('2016-11-26')
    )
  )
WHERE
  STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
  BETWEEN '2016-11-20' AND '2016-11-26'

# 2

# For each next period count events with 
# same first_open_timestamp
# Here is example for one of the weeks. 
# week 0 is Nov20-Nov26, week 1 is Nov27-Dec03

SELECT
  COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
  (
   TABLE_DATE_RANGE
    (
     [admob-app-id-xx:xx_IOS.app_events_], 
     TIMESTAMP('2016-11-27'), 
     TIMESTAMP('2016-12-03')
    )
  )
WHERE
  STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
  BETWEEN '2016-11-20' AND '2016-11-26'

# 3

# Now we have users for each week w1, w2, ... w5
# Calculate retention for each of them
# retention week 1 = w1 / w0 * 100 = 25.72181359
# rw2 = w2 / w1 * 100
# ...
# rw5 = w5 / w1 * 100

# 4 

# Shift week 0 by one and repeat from step 1

BigQuery查询提示请求

非常感谢任何有关构建复杂查询的提示和指示,这些提示可能会在一个步骤中汇总和计算此任务所需的所有数据。

Here is BigQuery Export schema if needed

附带问题:

  • 为什么所有user_dim.device_info.device_iduser_dim.device_info.resettable_device_id都是null
  • 文档中缺少
  • user_dim.app_info.app_id(如果firebase支持队友将会阅读此问题)
  • 应该如何使用event_dim.timestamp_microsevent_dim.previous_timestamp_micros,我无法达到目的。

PS

来自Firebase队友的人会回答这个问题。关于通过过滤或显示大查询示例来扩展群组功能的Five month ago there are was one mention,但事情并没有发生。他们说,Firebase Analytics是他们所说的,谷歌分析已被弃用。 现在,我花了第二天精益求精,并在现有的分析工具上构建自己的解决方案。我没有,堆栈溢出不是这个评论的地方,但是你在想什么?拆分测试可能会在语法上影响我的应用的保留。我的应用程序没有出售任何东西,漏斗和事件在许多情况下都不是有价值的指标。

1 个答案:

答案 0 :(得分:12)

  

非常感谢任何有关构建复杂查询的提示和指示,这些提示可能会在一个步骤中汇总和计算此任务所需的所有数据。

     

是的,通用bigquery可以正常工作

下面不是最通用的版本,但可以给你一个想法 在此示例中,我使用Stack Overflow Data

中提供的Google BigQuery Public Datasets

首先进行子选择 - 活动 - 在大多数情况下,只需要重新编写以反映数据细节的内容。
它的作用是:
一个。定义要为分析设置的时间段 在下面的例子中 - 它是一个月 - FORMAT_DATE('%Y-%m',...
但你可以分别使用year, week, day or anything else -  •按年份 - FORMAT_DATE('%Y',DATE(answers.creation_date))AS期间
 •按周 - FORMAT_DATE('%Y-%W',DATE(answers.creation_date))AS期间
 •白天 - FORMAT_DATE('%Y-%m-%d',DATE(answers.creation_date))AS期间
 •...
湾此外,它“仅过滤”您需要分析的事件/活动类型 例如,`WHERE CONCAT(' |',questions.tags,' |')LIKE'%| google-bigquery |%'寻找google-bigquery标记问题的答案

其余的子查询更加通用,大多数可以按原样使用

#standardSQL
WITH activities AS (
  SELECT answers.owner_user_id AS id,
    FORMAT_DATE('%Y-%m', DATE(answers.creation_date)) AS period
  FROM `bigquery-public-data.stackoverflow.posts_answers` AS answers
  JOIN `bigquery-public-data.stackoverflow.posts_questions` AS questions
  ON questions.id = answers.parent_id
  WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%' 
  GROUP BY id, period
), cohorts AS (
  SELECT id, MIN(period) AS cohort FROM activities GROUP BY id
), periods AS (
  SELECT period, ROW_NUMBER() OVER(ORDER BY period) AS num
  FROM (SELECT DISTINCT cohort AS period FROM cohorts)
), cohorts_size AS (
  SELECT cohort, periods.num AS num, COUNT(DISTINCT activities.id) AS ids 
  FROM cohorts JOIN activities ON activities.period = cohorts.cohort AND cohorts.id = activities.id
  JOIN periods ON periods.period = cohorts.cohort
  GROUP BY cohort, num
), retention AS (
  SELECT cohort, activities.period AS period, periods.num AS num, COUNT(DISTINCT cohorts.id) AS ids
  FROM periods JOIN activities ON activities.period = periods.period
  JOIN cohorts ON cohorts.id = activities.id 
  GROUP BY cohort, period, num 
)
SELECT 
  CONCAT(cohorts_size.cohort, ' - ',  FORMAT("%'d", cohorts_size.ids), ' users') AS cohort, 
  retention.num - cohorts_size.num AS period_lag, 
  retention.period as period_label,
  ROUND(retention.ids / cohorts_size.ids * 100, 2) AS retention , retention.ids AS rids
FROM retention
JOIN cohorts_size ON cohorts_size.cohort = retention.cohort
WHERE cohorts_size.cohort >= FORMAT_DATE('%Y-%m', DATE('2015-01-01'))
ORDER BY cohort, period_lag, period_label  

您可以使用您选择的工具直观显示上述查询的结果 注意:您可以使用period_lag或period_label
请参阅以下示例中使用它们的区别

with period_lag

enter image description here

with period_label

enter image description here