我正在使用BigQuery来了解有多少用户完成了特定的页面路径(在会话中的任何一点)。让我们说页面路径是Page 1 - >第2页 - >第3页。必须按顺序执行页面。我能够使用BQ建立页面路径 - 但此方法仅用于识别在会话中的任何位置到达这些页面的用户。例如,第1页 - >第456页 - >第2页。
有什么想法吗?
(SELECT [date]
, CASE WHEN pages like '/Page1' then fullVisitorId end as [users]
, CASE WHEN pages like '/Page1>>/Page2' then fullVisitorId end as [path_users_2]
, CASE WHEN pages like '/Page1>>Page2>>Page3' then fullVisitorId end as [path_users_3]
, [path_type]
, [path]
, [product]
, [device.deviceCategory]
FROM
( SELECT [date]
, [fullVisitorId]
, [visitId]
, [visitNumber]
, group_concat(hits.page.pagePath,'>>') as [pages]
, 'New Pages' as [path_type]
, 'Upgrade' as [path]
, 'Professional' as [product]
FROM
(
TABLE_DATE_RANGE
( [XXXXXX.ga_sessions_]
, TIMESTAMP('2014-06-01')
, TIMESTAMP('2014-06-05') )
)
where
(REGEXP_MATCH(hits.page.pagePath,r'^/Page1($|/$|\?|/\?|%3F)'))
or (REGEXP_MATCH(hits.page.pagePath,r'^/Page2($|/$|\?|/\?|%3F)'))
or ( (REGEXP_MATCH(hits.page.pagePath,r'^/Page3($|/$|\?|/\?|%3F)'))
and hits.transaction.transactionId is not null
and hits.item.productSku is not null
and hits.item.itemRevenue is not null )
group each by [date]
, [fullVisitorId]
, [visitId]
, [visitNumber]
, [path_type]
, [path]
, [product]
, [device.deviceCategory]
)
group each by
[date]
, [path_type]
, [path]
, [product]
, [users]
, [path_users_2]
, [path_users_3]
, [device.deviceCategory]
)
答案 0 :(得分:4)
/对于您的特定用例,我非常确定您可以通过避免JOIN和GROUP BY来更快地执行此操作。
考虑:
SELECT
[date], fullVisitorId, visitId, visitNumber,
GROUP_CONCAT(REGEXP_EXTRACT(hits.page.pagePath, '^(/[^/?]*)'), ">>")
WITHIN RECORD AS Sequence,
FROM
(TABLE_DATE_RANGE
( [XXXXXX.ga_sessions_]
, TIMESTAMP('2014-06-01')
, TIMESTAMP('2014-06-05') )
)
WHERE REGEXP_MATCH(hits.page.pagePath, r'^/Page[123]')
HAVING
Sequence CONTAINS "/Page1>>/Page2>>/Page3";
这可以利用RECORD
级别的scoped aggregation来避免GROUP BY
次单独会话。
此外,单个记录在Bigquery中是原子的,并且它们的重复字段按照导入时提供的顺序进行处理。因此,对于GA会话日志,因为所有内容都已完成WITHIN RECORD
,所以按顺序连接命中子记录。展平命中时间戳然后通过比较加入它们实际上只是重做了这项工作。
答案 1 :(得分:2)
您需要构建一系列查询,并使用hits.time作为时间顺序逐步到达完整路径。以Streak博客文章为例:Using Google BigQuery for Event Tracking
我们可以创建一个子查询来确定visitHomepage事件:
(SELECT sessionId as sessionId1,
timestamp as timestamp1
FROM [events.log]
WHERE name = "visitHomepage") AS step1
然后类似于step2,step3。
然后你可以将它们组合起来获得steps1_2
(SELECT sessionId1,
timestamp1,
IF(timestamp1 < timestamp2, timestamp2, NULL) as timestamp2
FROM
(SELECT sessionId1,
timestamp1,
timestamp2
FROM step1
LEFT JOIN step2
ON sessionId1 = sessionId2)
) AS steps1_2
获取我们想要的子查询!
(SELECT sessionId1 as sessionId,
timestamp1 as visitHomepageTimestamp,
timestamp2 as installExtensionTimestamp,
IF(timestamp2 < timestamp3, timestamp3, NULL) as signInTimestamp
FROM
(SELECT sessionId2,
timestamp2,
timestamp3
FROM steps1_2
LEFT JOIN step3
ON sessionId1 = sessionId3)
) AS steps1_2_3
阅读以上链接的blog post,详细说明如何构建查询,并查看BigQuery Cookbook。
或者,您可以根据hits.time
订购查询,以定义用户访问的网页顺序,并使用ROW_NUMBER
或POSITION
添加序列号,而不是使用结果进一步设定。