每个用户的(不同)日期范围之间的VisitId

时间:2017-06-04 09:42:22

标签: google-analytics google-bigquery

对于每个fullvisitorId,我正在尝试在date_1和date_2之间获取所有visitId。对于每个用户来说当然是不同的。

任何人都可以指点如何做到这一点吗?

例如:

  • user_1:我想在1st& amp;之间的所有visitId 6月20日
  • user_2:我希望12日和12日之间的所有visitId 6月27日 ......等儿子

date_1和date_2对应于他们在网站上执行的重要操作(Event匹配)。下载试用版&购买

提前感谢任何潜在客户。

1 个答案:

答案 0 :(得分:1)

解决此问题的一种可能方法是使用analytical functions。举个例子:

#standardSQL
WITH data AS(
  select '1' as user, '1' as visitid, '20170520' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event1' as eventCategory) as eventInfo)] hits UNION ALL
  select '1' as user, '2' as visitid, '20170521' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits UNION ALL
  select '1' as user, '3' as visitid, '20170522' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event2' as eventCategory) as eventInfo)] hits UNION ALL
  select '1' as user, '4' as visitid, '20170523' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits UNION ALL

  select '2' as user, '1' as visitid, '20170520' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event1' as eventCategory) as eventInfo)] hits UNION ALL
  select '2' as user, '2' as visitid, '20170521' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event2' as eventCategory) as eventInfo)] hits UNION ALL
  select '2' as user, '3' as visitid, '20170522' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits union all

  select '3' as user, '1' as visitid, '20170520' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('event1' as eventCategory) as eventInfo)] hits UNION ALL
  select '3' as user, '2' as visitid, '20170521' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits UNION ALL
  select '3' as user, '3' as visitid, '20170522' as date, ARRAY<STRUCT<hitNumber INT64, eventInfo STRUCT<eventCategory STRING> >> [STRUCT(1 as hitNumber, STRUCT('' as eventCategory) as eventInfo)] hits
)

SELECT
  user,
  visitid,
  date
FROM(
  SELECT 
    user,
    visitid,
    date,
    MIN(CASE WHEN hits.eventInfo.eventCategory = 'event1' THEN date END) OVER(PARTITION BY user) min_date,
MAX(CASE WHEN hits.eventInfo.eventCategory = 'event2' THEN date END) OVER(PARTITION BY user) max_date
FROM data,
UNNEST(hits) hits
)
WHERE date BETWEEN min_date AND max_date

data模拟您的 ga_sessions 数据(我将'fullvisitorid'命名为'用户')。

这假设给定用户可以拥有日期1和日期2的不同事件(因此它分别取MINMAX)并假设您将事件保存在eventCategory字段(鉴于您在会话级别定义了“下载”和“购买”事件,我建议您使用customDimensions字段而不是hits.eventInfo.eventCategory字段。

除分析函数外,您还可以使用标准SQL版本的ARRAYs and STRUCTs

SELECT
  user,
  ARRAY(SELECT AS STRUCT visitid, date FROM UNNEST(user_data) WHERE date BETWEEN min_date AND max_date) user_data
FROM(
  SELECT 
    user,
    ARRAY_AGG((SELECT AS STRUCT visitid, date)) user_data,
    MIN(CASE WHEN EXISTS(SELECT 1 FROM UNNEST(hits) hits WHERE hits.eventInfo.eventCategory = 'event1') then date END) min_date,
    MAX(CASE WHEN EXISTS(SELECT 1 FROM UNNEST(hits) hits WHERE hits.eventInfo.eventCategory = 'event2') THEN date END) max_date
FROM data
GROUP BY user
)
WHERE ARRAY_LENGTH(ARRAY(SELECT AS STRUCT visitid, date FROM UNNEST(user_data) WHERE date BETWEEN min_date AND max_date)) > 0

如果我所做的假设与您的数据不一致,您可以调整这些技术来查询您想要的内容。您还可以将模拟数据用于测试目的(以及使其更适合您的数据集)。