Here is result how I want to have
这是根据旧架构的Bigquery,在旧数据源表中正常运行。这是根据旧模式的查询。假定这可以进行会话持续时间的合理分配,这意味着它将提供会话持续时间(秒)的明智会话数(例如,它在图像空白处)。
select (case when engagement_time1 > 0 and engagement_time1 <= 30 then "0-30" else
(case when engagement_time1 > 30 and engagement_time1 <= 60 then "31-60" else
(case when engagement_time1 > 60 and engagement_time1 <= 180 then "61-180" else
(case when engagement_time1 > 180 and engagement_time1 <= 300 then "181-300" else
(case when engagement_time1 > 300 and engagement_time1 <= 600 then "301-600" else
(case when engagement_time1 > 600 and engagement_time1 <= 1800 then "601-1800" else
(case when engagement_time1 > 1800 then "1800+" else
"0" end) end) end) end) end) end) end)
as engagement_bracket_in_seconds,
count(*) total_sessions
, sum(engagement_time1) total_engagement_time
from
(
SELECT app_instance_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records, MAX(sess_id) OVER(PARTITION BY app_instance_id) total_sessions,
(ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds,
sum(case when name = "user_engagement" then engagement_time else 0 end)/1000 engagement_time1
FROM (
SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
FROM (
SELECT *, IF(
previous IS null
OR (min_time-previous)>(30*60*1000*1000), # sessions broken by this inactivity
1, 0) session_start
#https://blog.modeanalytics.com/finding-user-sessions-sql/
FROM (
SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
FROM (
SELECT user_dim.app_info.app_instance_id
, (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
, (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
event.name,
params.value.int_value engagement_time
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`,
UNNEST(event_dim) as event,
UNNEST(event.params) as params,
UNNEST(user_dim.user_properties) as user_params
where (event.name = "user_engagement" and params.key = "engagement_time_msec")
and
(user_params.key = "access" and user_params.value.value.string_value = "true") and
PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"
)
)
)
)
GROUP BY 1, 2
) where sess_id > 0
group by 1
ORDER BY (total_engagement_time/total_sessions)
旧查询的结果正确,如下图所示
由于Bigquery架构已更改,因此我尝试复制旧的 根据新架构查询,我也更改了数据源表(我是 确保所有事件都已记录在该表中)。
现在,我添加的代码行(根据新架构)替换了旧查询的某些部分,并且我将其作为一个整体运行,它也可以正常工作。
* 增加了部分(对新模式的附加) *
SELECT L.user_pseudo_id as app_instance_id ,R.min_time min_time,R.max_time max_time,L.event_name,L.eng_time engagement_time FROM
(SELECT user_pseudo_id
,event_name
,params.value.int_value eng_time
FROM `new_abc_datasource`,
UNNEST(event_params) as params,
UNNEST(user_properties) as user_params
where (event_name = "user_engagement" and params.key = "engagement_time_msec")
and
(user_params.key = "access" and user_params.value.string_value = "true") and
PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
) as L
left join (SELECT user_pseudo_id
, MIN(event_timestamp) AS min_time
,MAX(event_timestamp) AS max_time
FROM `analytics_151475732.events_*`,
UNNEST(event_params) as params,
UNNEST(user_properties) as user_params
where (event_name = "user_engagement" and params.key = "engagement_time_msec")
and
(user_params.key = "access" and user_params.value.string_value = "true") and
PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
GROUP BY user_pseudo_id) as R
ON L.user_pseudo_id=R.user_pseudo_id
* 替换旧查询的一部分 *
SELECT user_dim.app_info.app_instance_id
, (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
, (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
event.name,
params.value.int_value engagement_time
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`,
UNNEST(event_dim) as event,
UNNEST(event.params) as params,
UNNEST(user_dim.user_properties) as user_params
where (event.name = "user_engagement" and params.key = "engagement_time_msec")
and
(user_params.key = "access" and user_params.value.value.string_value = "true") and
PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"
新结果中的问题,减少了total_sessions 剧烈地。有人可以引导我说,这是对的吗 复制旧查询?因为我得到的结果是不同的 从旧的过去给出的结果开始。