如何计算Firebase Analytics原始数据中的会话和会话持续时间?

时间:2017-03-02 04:21:59

标签: sql firebase google-bigquery tableau firebase-analytics

如何计算链接到BigQuery的Firebase分析原始数据中的会话持续时间

我使用以下博客通过对每个记录中嵌套的事件使用flatten命令来计算用户,但我想知道如何继续计算会话会话持续时间按国家/地区和时间。

(我配置了很多应用程序,但是如果你可以帮助我使用SQL查询来计算会话持续时间和会话,那将会有很大的帮助)

Google Blog on using Firebase and big query

4 个答案:

答案 0 :(得分:14)

首先你需要定义一个会话 - 在下面的查询中,我会在用户处于非活动状态超过20分钟时中断会话。

现在,要查找所有使用SQL的会话,您可以使用https://blog.modeanalytics.com/finding-user-sessions-sql/中描述的技巧。

以下查询查找所有会话及其长度:

#standardSQL

SELECT app_instance_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records, MAX(sess_id) OVER(PARTITION BY app_instance_id) total_sessions,
   (ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds
FROM (
  SELECT *, SUM(session_start) OVER(PARTITION BY app_instance_id ORDER BY min_time) sess_id
  FROM (
    SELECT *, IF(
                previous IS null 
                OR (min_time-previous)>(20*60*1000*1000),  # sessions broken by this inactivity 
                1, 0) session_start 
                #https://blog.modeanalytics.com/finding-user-sessions-sql/
    FROM (
      SELECT *, LAG(max_time, 1) OVER(PARTITION BY app_instance_id ORDER BY max_time) previous
      FROM (
        SELECT user_dim.app_info.app_instance_id
          , (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
          , (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time
        FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160601`
      )
    )
  )
)
GROUP BY 1, 2
ORDER BY 1, 2

enter image description here

答案 1 :(得分:0)

您知道,Google更改了BigQuery Firebase数据库的架构: https://support.google.com/analytics/answer/7029846

感谢@Felipe回答,新格式将更改如下:

SELECT SUM(total_sessions) AS Total_Sessions, AVG(sess_length_seconds) AS Average_Session_Duration
FROM (
  SELECT user_pseudo_id, sess_id, MIN(min_time) sess_start, MAX(max_time) sess_end, COUNT(*) records, 
    MAX(sess_id) OVER(PARTITION BY user_pseudo_id) total_sessions,
    (ROUND((MAX(max_time)-MIN(min_time))/(1000*1000),1)) sess_length_seconds
  FROM (
    SELECT *, SUM(session_start) OVER(PARTITION BY user_pseudo_id ORDER BY min_time) sess_id
    FROM (
      SELECT *, IF(previous IS null OR (min_time-previous) > (20*60*1000*1000), 1, 0) session_start 
      FROM (
        SELECT *, LAG(max_time, 1) OVER(PARTITION BY user_pseudo_id ORDER BY max_time) previous
        FROM (SELECT user_pseudo_id, MIN(event_timestamp) AS min_time, MAX(event_timestamp) AS max_time
          FROM `dataset_name.table_name` GROUP BY user_pseudo_id)
      )
    )
  )
  GROUP BY 1, 2
  ORDER BY 1, 2
)

注意:根据您的项目信息更改数据集名称表名称

抽样结果: enter image description here

答案 2 :(得分:0)

使用BigQuery中新的Firebase架构,我发现@Maziar的答案对我不起作用,但是我不确定为什么。 取而代之的是,我使用以下内容进行计算:会话被定义为用户与您的应用互动至少10秒钟,会话被终止(如果用户在30分钟内未与应用互动)。 它提供了会话总数和会话长度(以分钟为单位),并且基于以下查询:https://modeanalytics.com/modeanalytics/reports/5e7d902f82de/queries/2cf4af47dba4

SELECT COUNT(*) AS sessions,
       AVG(length) AS average_session_length
  FROM (
  
SELECT global_session_id,
       (MAX(event_timestamp) - MIN(event_timestamp))/(60 * 1000 * 1000) AS length
  FROM (
SELECT user_pseudo_id,
       event_timestamp,
       SUM(is_new_session) OVER (ORDER BY user_pseudo_id, event_timestamp) AS global_session_id,
       SUM(is_new_session) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS user_session_id
  FROM (
       SELECT *,
              CASE WHEN event_timestamp - last_event >= (30*60*1000*1000) 
                     OR last_event IS NULL 
                   THEN 1 ELSE 0 END AS is_new_session
         FROM (
              SELECT user_pseudo_id,
                     event_timestamp,
                     LAG(event_timestamp,1) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS last_event
                FROM `dataset.events_2019*`
              ) last
       ) final
       ) session
 GROUP BY 1
       
       ) agg
WHERE length >= (10/60)

答案 3 :(得分:0)

通过最近的更改,BigQuery 表中的每个事件行都有 ga_session_id,您可以更轻松地计算会话数和平均会话长度。

ga_session_id 的值将在整个会话中保持不变,因此您无需单独定义会话。

您通过按 user_pseudo_id、ga_session_id 和 event_date 对结果进行分组来获取 event_timestamp 列的 Min 和 Max 值,以便您获得任何用户在任何给定日期的特定会话的会话持续时间。

WITH 
UserSessions as (
SELECT
        user_pseudo_id,
        event_timestamp,
        event_date,
        (Select value.int_value from UNNEST(event_params) where key = "ga_session_id") as session_id,
        event_name
      FROM `projectname.dataset_name.events_*`
),
SessionDuration as (
SELECT
user_pseudo_id,
session_id,
COUNT(*) AS events,
TIMESTAMP_DIFF(MAX(TIMESTAMP_MICROS(event_timestamp)), MIN(TIMESTAMP_MICROS(event_timestamp)), SECOND) AS session_duration
,event_date
FROM
UserSessions
WHERE session_id is not null
GROUP BY
user_pseudo_id,
session_id
,event_date
)
Select count(session_id) as NumofSessions,avg(session_duration) as AverageSessionLength from SessionDuration 

最后,您只需对 session_id 进行计数即可获得会话总数,并计算会话持续时间的平均值以获取平均会话长度的值。