我正在开发一个从Google Analytics自动填充的大型BigQuery数据集。对于这项工作,我试图使用数据计算退出率。此外,作为参考,我正在仔细查看以下链接which describes the BigQuery export schema in sufficient detail.
中提供的BigQuery导出架构。如上所述in this post from Google regarding the exit rate,退出率可以定义为"对于页面的所有综合浏览量,退出率是会话中最后一个的百分比。"为了计算这一点,我推测,对于每次唯一访问,我需要检查hits.page.pagePath列中的每个页面,如果发现错误的正则表达式在之前另一个不表示错误的URL,然后错误路径后的退出可以计为退出。
这看起来是路径分析的一个非常明显的例子。我还不确定BigQuery是否可以轻松或有效地处理它。通常,我感兴趣的URL包含退出的URL使用:
REGEXP_MATCH(hits.page.pagePath, r'/[^/]+error\.aspx')
例如,我开始在以下上下文中使用它作为试用版:
SELECT hits.page.pagePath AS Page_Path
FROM [XXXXXXXX.ga_sessions_20140711]
WHERE REGEXP_MATCH(hits.page.pagePath, r'/[^/]+error\.aspx') OR REGEXP_MATCH(hits.page.pagePath, r'/[^/]+genericerror\.aspx')
我可以非常感谢任何建议或示例,可以指出哪些突出显示某人如何成功使用BigQuery来计算退出率。
更多详细信息(2014年7月14日):
此处设置的数据是XXXXXXXX。以下是添加的一些进一步细节。将要执行的整个查询将创建一个包含以下输出的表:
日期页面(来自案例 - 每天有多个案例,因此每种情况下每个案例将分别有7或8行),指标1,指标2(跳出),指标3(退出)
该查询具有以下总体规格:
说明:^ / XXX / [^ /] + error.aspx的网页浏览量和独特的综合浏览量或^ / XXX / [^ /] + genericerror.aspx
指标:网页浏览量,唯一身份网页浏览量,会话数,用户数,跳出率(退回率),退出率(退出率)
案例陈述值(页面维值):首页2.0,主页1.0,入站搜索2.0,入站搜索1.0,出站搜索2.0,出站搜索1.0,评论行程2.0,评论行程1.0,旅行者info 2.0,Traveler info 1.0,Seat selector 2.0,Seat selector 1.0,Payment info 2.0,Payment info 1.0,digital 2.0 other,digital 1.0 other
案例陈述:
Case when previous page = "^/XXX/[^/]+/default\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Homepage 2.0" when previous page ="^/web/[^/]+/default\.aspx" and landing page = "^/web/[^/]+/default\.aspx" then "Homepage 1.0" when previous page="^XXX/[^/]+/apps/booking?flight/(searchresult1|search(rt|ow|md))\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Inbound Search 2.0" when previous page="^web/[^/]+/apps/booking?flight/(searchresult1|search(rt|ow|md))\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Inbound Search 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/searchResult2\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Outbound Search 2.0" when previous page="^/web/[^/]+/apps/booking/flight/searchResult2\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Outbound Search 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/reviewRevenue\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Review Itinerary 2.0" when previous page="^/web/[^/]+/apps/booking/flight/reviewRevenue\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Review Itinerary 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/traveler\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Traveler info 2.0" when previous page="^/web/[^/]+/apps/booking/flight/traveler\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Traveler info 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/seatSelector\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Seat selector 2.0" when previous page="^/web/[^/]+/apps/booking/flight/seatSelector\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Seat selector 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/billingRevenue\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Payment info 2.0" when previous page="^/web/[^/]+/apps/booking/flight/billingRevenue\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Payment info 1.0" when landing page ="^/XXX/[^/]+/default\.aspx" then "digital 2.0 other" else "digital 1.0 other" end as Page
尺寸:日期,页面,设备类别,浏览器,自定义变量(值04)
过滤器: REGEX:page = ^ / XXX / [^ /] + / error.aspx或^ / XXX / [^ /] + / genericerror.aspx
分组依据/排序:日期,页面
此处的最终查询必须是在指定天数内执行Ad Hoc的单个查询,该查询生成上述简单表(各种值的总和)。退出率看起来是手动计算。首先,我使用下表来生成各种匹配的URL:
SELECT date, CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, visitId, visitNumber, fullVisitorId, totals.pageviews, totals.bounces,
hits.page.pagePath, hits.page.pageTitle, device.deviceCategory, device.browser, device.browserVersion, hits.customVariables.index,
hits.customVariables.customVarName, hits.customVariables.customVarValue, hits.time
FROM (FLATTEN([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customVariables.index = 4
ORDER BY unique_visit_id DESC, hits.time ASC
LIMIT 1000;
我现在遇到问题,使用滞后函数来查看为执行此操作而提供的先前hits.page.pagePath,然后计算退出率。
答案 0 :(得分:2)
我已将它们放在一起,为您提供一个额外的列,指出页面是否是会话中的最后一个匹配。重要的是检查此中的匹配类型。
如果页面上次点击,则将列添加到状态
SELECT
unique_visit_id,
page,
hit_number,
hit_type,
max_hit,
IF(hit_number = max_hit, 'yes', 'no') as last_page
FROM (SELECT CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, hits.hitNumber AS hit_number, hits.type AS hit_type, hits.page.pagePath AS page, MAX(hit_number) OVER (PARTITION BY unique_visit_id) AS max_hit
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE hits.type = 'PAGE'
GROUP BY unique_visit_id, hit_number, hit_type, page
ORDER BY unique_visit_id, hit_number)
获取网页浏览量,退出和退出率
这将为您提供实际的计算
SELECT
page,
COUNT(page) as pageviews,
SUM(IF(hit_number = max_hit, 1, 0)) as exits,
(SUM(IF(hit_number = max_hit, 1, 0))/COUNT(page)) * 100 AS exit_rate
FROM (SELECT CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, hits.hitNumber AS hit_number, hits.type AS hit_type, hits.page.pagePath AS page, MAX(hit_number) OVER (PARTITION BY unique_visit_id) AS max_hit
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE hits.type = 'PAGE'
GROUP BY unique_visit_id, hit_number, hit_type, page
ORDER BY unique_visit_id, hit_number)
GROUP BY page
ORDER BY pageviews DESC
答案 1 :(得分:0)
2021 年更新
最初的问题是 TL;DR
我的同事来这里是想找到一种计算 BigQuery 退出率的方法,但是 @tfayyaz 解决方案对他不起作用。
对于那些将搜索 GA 退出率 BigQuery 解决方案的人,您可以在下面找到工作代码:
SELECT
page,
COUNT(hit_number) AS hit_number,
SUM(IF(hit_number = max_hit, 1, 0)) as exit_count,
SUM(IF(hit_number = max_hit, 1, 0)) / COUNT(hit_number) AS ex_rate
FROM (
SELECT CONCAT(fullVisitorId, CAST(visitId AS STRING)) AS unique_visit_id,
hits.page.pagePath AS page,
hits.hitNumber AS hit_number,
MAX(hits.hitNumber) OVER(PARTITION BY CONCAT(fullVisitorId, CAST(visitId AS STRING))) AS max_hit
FROM `[YOUR_PROJECT].[YOUR_VIEW_ID].ga_sessions_[YOUR_DATE]`, UNNEST (hits) AS hits
WHERE hits.type = 'PAGE')
GROUP BY page
ORDER BY hit_number DESC
这里的逻辑如下:
SELECT
page, -- 8. get all the pages
COUNT(hit_number) AS pageviews, -- 10. sum the pageviews (optional)
SUM(IF(hit_number = max_hit, 1, 0)) as exit_count, -- 11. sum the exits (optional)
SUM(IF(hit_number = max_hit, 1, 0)) / COUNT(hit_number) AS ex_rate -- 12. calculate exit rate
FROM ( -- 7. from this subquery
-- subquery
SELECT CONCAT(fullVisitorId, CAST(visitId AS STRING)) AS unique_visit_id, -- 3. get unique user+session id
hits.page.pagePath AS page, -- 4. get every page visited during these sessions
hits.hitNumber AS hit_number, -- 5. get hit count for them
MAX(hits.hitNumber) OVER(PARTITION BY CONCAT(fullVisitorId, CAST(visitId AS STRING))) AS max_hit -- 6. then append the last pageview number for every user+session id respectively
FROM `[YOUR_PROJECT].[YOUR_VIEW_ID].ga_sessions_[YOUR_DATE]`, UNNEST (hits) AS hits -- 1. from your GA view data
WHERE hits.type = 'PAGE') -- 2. for pageview hits only
--
GROUP BY page -- 9. group the results by them
ORDER BY hit_number DESC -- 13. sort the result like in GA Exit Pages report