如何计算"退出率"在浏览器控制台中使用BigQuery SQL

时间:2014-07-13 08:26:19

标签: google-analytics google-bigquery

我正在开发一个从Google Analytics自动填充的大型BigQuery数据集。对于这项工作,我试图使用数据计算退出率。此外,作为参考,我正在仔细查看以下链接which describes the BigQuery export schema in sufficient detail.

中提供的BigQuery导出架构。

如上所述in this post from Google regarding the exit rate,退出率可以定义为"对于页面的所有综合浏览量,退出率是会话中最后一个的百分比。"为了计算这一点,我推测,对于每次唯一访问,我需要检查hits.page.pagePath列中的每个页面,如果发现错误的正则表达式之前另一个不表示错误的URL,然后错误路径后的退出可以计为退出。

这看起来是路径分析的一个非常明显的例子。我还不确定BigQuery是否可以轻松或有效地处理它。通常,我感兴趣的URL包含退出的URL使用:

REGEXP_MATCH(hits.page.pagePath, r'/[^/]+error\.aspx')

例如,我开始在以下上下文中使用它作为试用版:

SELECT hits.page.pagePath AS Page_Path
FROM [XXXXXXXX.ga_sessions_20140711]
WHERE REGEXP_MATCH(hits.page.pagePath, r'/[^/]+error\.aspx') OR REGEXP_MATCH(hits.page.pagePath, r'/[^/]+genericerror\.aspx')

我可以非常感谢任何建议或示例,可以指出哪些突出显示某人如何成功使用BigQuery来计算退出率。

更多详细信息(2014年7月14日):

此处设置的数据是XXXXXXXX。以下是添加的一些进一步细节。将要执行的整个查询将创建一个包含以下输出的表:

日期页面(来自案例 - 每天有多个案例,因此每种情况下每个案例将分别有7或8行),指标1,指标2(跳出),指标3(退出)

该查询具有以下总体规格:

说明:^ / XXX / [^ /] + error.aspx的网页浏览量和独特的综合浏览量或^ / XXX / [^ /] + genericerror.aspx

指标:网页浏览量,唯一身份网页浏览量,会话数,用户数,跳出率(退回率),退出率(退出率)

案例陈述值(页面维值):首页2.0,主页1.0,入站搜索2.0,入站搜索1.0,出站搜索2.0,出站搜索1.0,评论行程2.0,评论行程1.0,旅行者info 2.0,Traveler info 1.0,Seat selector 2.0,Seat selector 1.0,Payment info 2.0,Payment info 1.0,digital 2.0 other,digital 1.0 other

案例陈述:

Case when previous page = "^/XXX/[^/]+/default\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Homepage 2.0" when previous page ="^/web/[^/]+/default\.aspx" and landing page = "^/web/[^/]+/default\.aspx" then "Homepage 1.0" when previous page="^XXX/[^/]+/apps/booking?flight/(searchresult1|search(rt|ow|md))\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Inbound Search 2.0" when previous page="^web/[^/]+/apps/booking?flight/(searchresult1|search(rt|ow|md))\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Inbound Search 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/searchResult2\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Outbound Search 2.0" when previous page="^/web/[^/]+/apps/booking/flight/searchResult2\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Outbound Search 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/reviewRevenue\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Review Itinerary 2.0" when previous page="^/web/[^/]+/apps/booking/flight/reviewRevenue\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Review Itinerary 1.0"  when previous page="^/XXX/[^/]+/apps/booking/flight/traveler\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Traveler info 2.0" when previous page="^/web/[^/]+/apps/booking/flight/traveler\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Traveler info 1.0"  when previous page="^/XXX/[^/]+/apps/booking/flight/seatSelector\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Seat selector 2.0" when previous page="^/web/[^/]+/apps/booking/flight/seatSelector\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Seat selector 1.0" when previous page="^/XXX/[^/]+/apps/booking/flight/billingRevenue\.aspx" and landing page="^/XXX/[^/]+/default\.aspx" then "Payment info 2.0" when previous page="^/web/[^/]+/apps/booking/flight/billingRevenue\.aspx" and landing page="^/web/[^/]+/default\.aspx" then "Payment info 1.0" when landing page ="^/XXX/[^/]+/default\.aspx" then "digital 2.0 other" else "digital 1.0 other" end as Page

尺寸:日期,页面,设备类别,浏览器,自定义变量(值04)

过滤器: REGEX:page = ^ / XXX / [^ /] + / error.aspx或^ / XXX / [^ /] + / genericerror.aspx

分组依据/排序:日期,页面

此处的最终查询必须是在指定天数内执行Ad Hoc的单个查询,该查询生成上述简单表(各种值的总和)。退出率看起来是手动计算。首先,我使用下表来生成各种匹配的URL:

SELECT date, CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, visitId, visitNumber, fullVisitorId, totals.pageviews, totals.bounces, 
hits.page.pagePath, hits.page.pageTitle, device.deviceCategory, device.browser, device.browserVersion, hits.customVariables.index,
hits.customVariables.customVarName, hits.customVariables.customVarValue, hits.time
FROM (FLATTEN([XXXXXXXX.ga_sessions_20140711], hits.time))
WHERE hits.customVariables.index = 4
ORDER BY unique_visit_id DESC, hits.time ASC
LIMIT 1000;

我现在遇到问题,使用滞后函数来查看为执行此操作而提供的先前hits.page.pagePath,然后计算退出率。

2 个答案:

答案 0 :(得分:2)

我已将它们放在一起,为您提供一个额外的列,指出页面是否是会话中的最后一个匹配。重要的是检查此中的匹配类型。

如果页面上次点击,则将列添加到状态

SELECT 
  unique_visit_id,
  page,
  hit_number,
  hit_type,
  max_hit,
  IF(hit_number = max_hit, 'yes', 'no') as last_page
FROM (SELECT CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, hits.hitNumber AS hit_number, hits.type AS hit_type, hits.page.pagePath AS page, MAX(hit_number) OVER (PARTITION BY unique_visit_id) AS max_hit
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE hits.type = 'PAGE'
GROUP BY unique_visit_id, hit_number, hit_type, page
ORDER BY unique_visit_id, hit_number)

获取网页浏览量,退出和退出率

这将为您提供实际的计算

SELECT 
  page,
  COUNT(page) as pageviews,
  SUM(IF(hit_number = max_hit, 1, 0)) as exits,
  (SUM(IF(hit_number = max_hit, 1, 0))/COUNT(page)) * 100 AS exit_rate
FROM (SELECT CONCAT(fullVisitorId, STRING(visitId)) AS unique_visit_id, hits.hitNumber AS hit_number, hits.type AS hit_type, hits.page.pagePath AS page, MAX(hit_number) OVER (PARTITION BY unique_visit_id) AS max_hit
FROM [google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE hits.type = 'PAGE'
GROUP BY unique_visit_id, hit_number, hit_type, page
ORDER BY unique_visit_id, hit_number)
GROUP BY page
ORDER BY pageviews DESC

答案 1 :(得分:0)

2021 年更新

最初的问题是 TL;DR
我的同事来这里是想找到一种计算 BigQuery 退出率的方法,但是 @tfayyaz 解决方案对他不起作用。
对于那些将搜索 GA 退出率 BigQuery 解决方案的人,您可以在下面找到工作代码:

SELECT 
  page,
  COUNT(hit_number) AS hit_number,
  SUM(IF(hit_number = max_hit, 1, 0)) as exit_count,
  SUM(IF(hit_number = max_hit, 1, 0)) / COUNT(hit_number) AS ex_rate
FROM (
    SELECT CONCAT(fullVisitorId, CAST(visitId AS STRING)) AS unique_visit_id, 
        hits.page.pagePath AS page,
        hits.hitNumber AS hit_number, 
        MAX(hits.hitNumber) OVER(PARTITION BY CONCAT(fullVisitorId, CAST(visitId AS STRING))) AS max_hit
    FROM `[YOUR_PROJECT].[YOUR_VIEW_ID].ga_sessions_[YOUR_DATE]`, UNNEST (hits) AS hits
    WHERE hits.type = 'PAGE')
GROUP BY page
ORDER BY hit_number DESC

这里的逻辑如下:

SELECT 
  page, -- 8. get all the pages
  COUNT(hit_number) AS pageviews, -- 10. sum the pageviews (optional)
  SUM(IF(hit_number = max_hit, 1, 0)) as exit_count, -- 11. sum the exits (optional)
  SUM(IF(hit_number = max_hit, 1, 0)) / COUNT(hit_number) AS ex_rate -- 12. calculate exit rate
FROM ( -- 7. from this subquery

-- subquery

    SELECT CONCAT(fullVisitorId, CAST(visitId AS STRING)) AS unique_visit_id, -- 3. get unique user+session id
        hits.page.pagePath AS page, -- 4. get every page visited during these sessions
        hits.hitNumber AS hit_number, -- 5. get hit count for them
        MAX(hits.hitNumber) OVER(PARTITION BY CONCAT(fullVisitorId, CAST(visitId AS STRING))) AS max_hit -- 6. then append the last pageview number for every user+session id respectively
    FROM `[YOUR_PROJECT].[YOUR_VIEW_ID].ga_sessions_[YOUR_DATE]`, UNNEST (hits) AS hits -- 1. from your GA view data
    WHERE hits.type = 'PAGE') -- 2. for pageview hits only

--

GROUP BY page -- 9. group the results by them
ORDER BY hit_number DESC -- 13. sort the result like in GA Exit Pages report