这是关于查看Firebase Analytics数据中的分布的第二篇文章(来自first post的后续跟进)。这一次,我想基于Firebase会话数据在BigQuery中创建用户分发表。输出应如下所示:
我设法创建以下脚本来依靠app_instance_id:
#standardSQL
SELECT
COUNT(DISTINCT(CASE WHEN sess_id = 0 THEN app_instance_id END)) AS sess_count_0,
COUNT(DISTINCT(CASE WHEN sess_id = 1 THEN app_instance_id END)) AS sess_count_1,
COUNT(DISTINCT(CASE WHEN sess_id > 1 AND sess_id <= 5 THEN app_instance_id END)) AS sess_count_2BETWEEN5,
COUNT(DISTINCT(CASE WHEN sess_id > 5 AND sess_id <= 10 THEN app_instance_id END)) AS sess_count_6BETWEEN10,
COUNT(DISTINCT(CASE WHEN sess_id > 10 AND sess_id <= 30 THEN app_instance_id END)) AS sess_count_11BETWEEN30,
COUNT(DISTINCT(CASE WHEN sess_id > 30 THEN app_instance_id END)) AS sess_count_PLUS31
FROM (SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
FROM (SELECT *, IF(previous IS null OR (min_time-previous)>(20*60*1000*1000),1, 0) session_start
FROM (SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
FROM (SELECT user_dim.app_info.app_instance_id,
user_dim.device_info.mobile_model_name,
user_dim.device_info.platform_version,
(SELECT MIN(timestamp_micros)
FROM UNNEST(event_dim)) min_time,
(SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time
FROM `firebase-public-project.com_firebase_demo_IOS.app_events_*`
WHERE (_TABLE_SUFFIX BETWEEN '20170701' AND '20170731')
)
)
)
)
问题:
考虑到用户(而不是会话),我想100%确定是否仍然依赖应用实例(而不是会话ID)?
有关优化此查询的任何想法是否有更有效的方法通过一个查询聚合所有分布范围?
最后,我想将我从上面得到的整体总数与在同一时期内触发session_start
- 事件的不同用户进行比较。我希望看到它大致会对齐,但事实并非如此。为什么会有这么大的差异:7688 vs 16310(488 + 7343 + 4967 + 1956 + 1165 + 391)?我的逻辑在哪里出错?
#standardSQL SELECT COUNT (DISTINCT user_dim.app_info.app_instance_id) as users FROM `firebase-public-project.com_firebase_demo_IOS.app_events_*`, UNNEST(event_dim) AS event WHERE (_TABLE_SUFFIX BETWEEN '20170701' AND '20170731') AND event.name = "session_start"