使用Google Analytics导出的数据在Big查询中进行群组/保留查询

时间:2017-08-01 15:16:32

标签: sql google-analytics google-bigquery retention

我需要帮助制定同类群组/保留查询

我正在尝试构建一个查询来查看在第一次访问时(在时间范围内)执行ActionX的访问者,然后再访问多少天后他们再次执行Action X

I(最终)需要的输出看起来像这样......

screen

我正在处理的表格是将Google Analytics导出为Big Query

如果有人可以帮我这个或任何写过类似我可以操纵的查询的人?

谢谢

4 个答案:

答案 0 :(得分:2)

只是为了给你一个简单的想法/方向

以下是BigQuery Standard SQL

   
#standardSQL
SELECT 
  Date_of_action_first_taken,
  ROUND(100 * later_1_day / Visits) AS later_1_day,
  ROUND(100 * later_2_days / Visits) AS later_2_days,
  ROUND(100 * later_3_days / Visits) AS later_3_days
FROM `OutputFromQuery`  

您可以使用您问题中的以下虚拟数据进行测试

#standardSQL
WITH `OutputFromQuery` AS (
  SELECT '01.07.17' AS Date_of_action_first_taken, 1000 AS Visits, 800 AS later_1_day, 400 AS later_2_days, 300 AS later_3_days UNION ALL
  SELECT '02.07.17', 1000, 860, 780, 860 UNION ALL
  SELECT '29.07.17', 1000, 780, 120, 0 UNION ALL
  SELECT '30.07.17', 1000, 710, 0, 0
)
SELECT 
  Date_of_action_first_taken,
  ROUND(100 * later_1_day / Visits) AS later_1_day,
  ROUND(100 * later_2_days / Visits) AS later_2_days,
  ROUND(100 * later_3_days / Visits) AS later_3_days
FROM `OutputFromQuery`  

OutputFromQuery数据如下:

Date_of_action_first_taken  Visits  later_1_day later_2_days    later_3_days  
01.07.17                    1000    800         400             300  
02.07.17                    1000    860         780             860  
29.07.17                    1000    780         120             0    
30.07.17                    1000    710         0               0    

,最终输出是:

Date_of_action_first_taken  later_1_day later_2_days    later_3_days     
01.07.17                    80.0        40.0            30.0     
02.07.17                    90.0        78.0            86.0     
29.07.17                    80.0        12.0            0.0  
30.07.17                    70.0        0.0             0.0  

答案 1 :(得分:1)

所以我想我可能已经破解了......从这个输出我然后需要操纵它(数据透视表)使它看起来像所需的输出。

任何人都可以为我审核,让我知道你的想法吗?

`WITH
cohort_items AS (
SELECT 
MIN( TIMESTAMP_TRUNC(TIMESTAMP_MICROS((visitStartTime*1000000 + 
h.time*1000)), DAY) ) AS cohort_day, fullVisitorID
FROM
TABLE123 AS U,
UNNEST(hits) AS h
WHERE _TABLE_SUFFIX BETWEEN "20170701" AND "20170731"
AND 'ACTION TAKEN'
GROUP BY 2
),


user_activites AS (
SELECT
A.fullVisitorID,
DATE_DIFF(DATE(TIMESTAMP_TRUNC(TIMESTAMP_MICROS((visitStartTime*1000000 + h.time*1000)), DAY)), DATE(C.cohort_day), DAY) AS day_number 
FROM `Table123` A

LEFT JOIN cohort_items C ON A.fullVisitorID = C.fullVisitorID,
UNNEST(hits) AS h

WHERE
A._TABLE_SUFFIX BETWEEN "20170701 AND "20170731"

AND 'ACTION TAKEN'
GROUP BY 1,2),

cohort_size AS (
SELECT 
cohort_day,
count(1) as number_of_users
FROM 
cohort_items
GROUP BY 1
ORDER BY 1
),

retention_table AS (
SELECT
C.cohort_day,
A.day_number,
COUNT(1) AS number_of_users
FROM
user_activites A

LEFT JOIN cohort_items C ON A.fullVisitorID = C.fullVisitorID
GROUP BY 1,2
)


SELECT
B.cohort_day,
S.number_of_users as total_users,
B.day_number,
B.number_of_users  /  S.number_of_users as percentage
FROM retention_table B

LEFT JOIN cohort_size S ON B.cohort_day = S.cohort_day

WHERE B.cohort_day IS NOT NULL
ORDER BY 1, 3
`

提前谢谢!

答案 2 :(得分:1)

如果您使用BigQuery中提供的某些技术,您可以使用成本效益和性能有效的解决方案来解决此类问题。举个例子:

SELECT
  init_date,
  ARRAY((SELECT AS STRUCT days, freq, ROUND(freq * 100 / MAX(freq) OVER(), 2) FROM UNNEST(data) ORDER BY days)) data
FROM(
  SELECT
  init_date,
  ARRAY_AGG(STRUCT(days, freq)) data
FROM(
  SELECT
    init_date,
    data AS days,
    COUNT(data) freq
FROM(
  SELECT
    init_date,
    ARRAY(SELECT DATE_DIFF(PARSE_DATE("%Y%m%d", dts), PARSE_DATE("%Y%m%d", init_date), DAY) AS dt FROM UNNEST(dts) dts) data
  FROM(
    SELECT 
      MIN(date) init_date,
      ARRAY_AGG(DISTINCT date) dts
    FROM `Table123`
    WHERE TRUE
    AND EXISTS(SELECT 1 FROM UNNEST(hits) where eventinfo.eventCategory = 'recommendation') -- This is your 'ACTION TAKEN' filter
    AND _TABLE_SUFFIX BETWEEN "20170724" AND "20170731"
    GROUP BY fullvisitorid
    )
    ),
    UNNEST(data) data
    GROUP BY init_date, days
   )
  GROUP BY init_date
)

我针对我们的G.A数据以及与我们的推荐系统进行了互动的选定客户测试了此查询(正如您在过滤器选择WHERE EXISTS...中看到的那样)。结果示例(出于隐私原因,省略了freq的绝对值):

enter image description here

正如您所看到的,例如,在第28天,8%的客户在1天后回来并再次与系统进行交互。

我建议您使用此查询,看看它是否适合您。它更简单,更便宜,更快速,更有希望维护。

答案 3 :(得分:1)

我在Turn Your App Data into Answers with Firebase and BigQuery (Google I/O'19)上找到了此查询

它应该可以工作:)

#standardSQL

###################################################
# Part 1: Cohort of New Users Starting on DEC 24
###################################################
WITH 
new_user_cohort AS (
  SELECT DISTINCT
    user_pseudo_id as new_user_id
  FROM
    `[your_project].[your_firebase_table].events_*`
  WHERE
    event_name =  `[chosen_event] ` AND
    #set the date from when starting cohort analysis
    FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event_timestamp), DAY, "Etc/GMT+1")) = '20191224' AND
    _TABLE_SUFFIX BETWEEN '20191224' AND '20191230'
),

num_new_users AS (
  SELECT count(*) as num_users_in_cohort FROM new_user_cohort
),

#############################################
# Part 2: Engaged users from Dec 24 cohort
#############################################
engaged_users_by_day AS (
  SELECT
    FORMAT_TIMESTAMP("%Y%m%d", TIMESTAMP_TRUNC(TIMESTAMP_MICROS(event_timestamp), DAY, "Etc/GMT+1")) as event_day,
    COUNT(DISTINCT user_pseudo_id) as num_engaged_users
  FROM
    `[your_project].[your_firebase_table].events_*`
  INNER JOIN
    new_user_cohort ON new_user_id = user_pseudo_id
  WHERE
    event_name = 'user_engagement' AND
    _TABLE_SUFFIX BETWEEN '20191224' AND '20191230'
  GROUP BY
    event_day
)


####################################################################
# Part 3: Daily Retention = [Engaged Users / Total Users]
####################################################################
SELECT
  event_day,
  num_engaged_users,
  num_users_in_cohort,
  ROUND((num_engaged_users / num_users_in_cohort), 3) as retention_rate
FROM
  engaged_users_by_day
CROSS JOIN
  num_new_users
ORDER BY
  event_day