我尝试使用BigQuery确定Firebase分析中两个事件之间的平均时间。该表看起来像这样:
我想收集LOGIN_CALL和LOGIN_CALL_OK事件的timstamp_micros,从LOGIN_CALL_OK中减去LOGIN_CALL并计算所有行的平均值。
#standardSQL
SELECT AVG(
(SELECT
event.timestamp_micros
FROM
`table`,
UNNEST(event_dim) AS event
where event.name = "LOGIN_CALL_OK") -
(SELECT
event.timestamp_micros
FROM
`table`,
UNNEST(event_dim) AS event
where event.name = "LOGIN_CALL"))
from `table`
我设法列出了低号或高号,但是每当我尝试对它们进行任何数学运算时,我都会遇到错误,我很难分开。上面的这种方法似乎应该可以工作,但我得到以下错误:
错误:标量子查询产生了多个元素
我读到这个错误意味着每个UNNEST()函数都返回一个数组,而不是导致AVG进入barf的单个值。我曾试图取消一次并使用" low"和"嗨"命名为值,但无法正确使用event_dim.name进行过滤。
答案 0 :(得分:4)
我无法完全测试这个,但也许这可能适合你:
WITH data AS(
SELECT STRUCT('1' as user_id) user_dim, ARRAY< STRUCT<date string, name string, timestamp_micros INT64> > [('20170610', 'EVENT1', 1497088800000000), ('20170610', 'LOGIN_CALL', 1498088800000000), ('20170610', 'LOGIN_CALL_OK', 1498888800000000), ('20170610', 'EVENT2', 159788800000000), ('20170610', 'LOGIN_CALL', 1599088800000000), ('20170610', 'LOGIN_CALL_OK', 1608888800000000)] event_dim union all
SELECT STRUCT('2' as user_id) user_dim, ARRAY< STRUCT<date string, name string, timestamp_micros INT64> > [('20170610', 'EVENT1', 1497688500400000), ('20170610', 'LOGIN_CALL', 1497788800000000)] event_dim UNION ALL
SELECT STRUCT('3' as user_id) user_dim, ARRAY< STRUCT<date string, name string, timestamp_micros INT64> > [('20170610', 'EVENT1', 1487688500400000), ('20170610', 'LOGIN_CALL', 1487788845000000), ('20170610', 'LOGIN_CALL_OK', 1498888807700000)] event_dim
)
SELECT
AVG(time_diff) avg_time_diff
FROM(
SELECT
CASE WHEN e.name = 'LOGIN_CALL' AND LEAD(NAME,1) OVER(PARTITION BY user_dim.user_id ORDER BY timestamp_micros ASC) = 'LOGIN_CALL_OK' THEN TIMESTAMP_DIFF(TIMESTAMP_MICROS(LEAD(TIMESTAMP_MICROS, 1) OVER(PARTITION BY user_dim.user_id ORDER BY timestamp_micros ASC)), TIMESTAMP_MICROS(TIMESTAMP_MICROS), day) END time_diff
FROM data,
UNNEST(event_dim) e
WHERE e.name in ('LOGIN_CALL', 'LOGIN_CALL_OK')
)
我使用Firebase Schema中的相同架构模拟了3个用户。
基本上,我首先应用UNNEST
操作,以使每个值event_dim.name
。然后应用过滤器只获取您感兴趣的事件,即“LOGIN_CALL”和“LOGIN_CALL_OK”。
正如Mosha在上面评论的那样,你需要对这些行进行一些识别,否则你将无法知道哪个事件成功了,这就是为什么分析函数的分区也将user_dim.user_id
作为输入。
之后,只有TIMESTAMP次操作才能在适当的时候获得差异(当前导事件是“LOGIN_CALL_OK”而当前的事件是“LOGIN_CALL”然后取差异。这在CASE表达式中表示)
您可以在TIMESTAMP_DIFF功能中选择要分析的日期的哪一部分,例如秒,分钟,天等。