BigQuery的Google Analytics数据中的网页组合

时间:2017-01-04 12:01:29

标签: sql google-analytics google-bigquery

在Google BigQuery中使用Google Analytics数据,我可以在/ confirm /页面上获得包含点击次数的会话数量:

#standardSQL
SELECT date AS Date, COUNT(Date) AS Sessions
FROM (
  SELECT date
  FROM `123456789.ga_sessions_20161202`
  CROSS JOIN UNNEST(hits) as hit
  WHERE hit.type = 'PAGE' AND REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$')
  GROUP BY VisitId, fullVisitorId, date
)
GROUP BY Date
ORDER BY Date ASC, Sessions ASC;

如果我想显示 / / confirm / page和/ payment / page的会话数量怎么样?我的SQL应该是什么样的?

4 个答案:

答案 0 :(得分:2)

执行此操作的最有效方法是在WHERE子句中使用单个子查询来检查两种命中类型。例如,

#standardSQL
SELECT DATE, COUNT(*) AS Sessions
FROM `123456789.ga_sessions_20161202`
WHERE ((SELECT COUNTIF(hit.page.pagePath LIKE '%confirm/$') > 0 AND
          COUNTIF(hit.page.pagePath LIKE '%/payment/$%') > 0
        FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE'))
GROUP BY DATE
ORDER BY DATE ASC, Sessions ASC;

答案 1 :(得分:1)

尝试以下,应该给你一个想法

#standardSQL
SELECT DATE, COUNT(1) AS Sessions
FROM `123456789.ga_sessions_20161202`
WHERE (SELECT COUNT(1) FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE' 
                 AND REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$') ) > 0
AND (SELECT COUNT(1) FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE' 
                 AND REGEXP_CONTAINS(hit.page.pagePath, '/payment/$') ) > 0
GROUP BY DATE
ORDER BY DATE ASC, Sessions ASC;

以上可以进一步优化如下

#standardSQL
SELECT DATE, COUNT(1) AS Sessions
FROM `123456789.ga_sessions_20161202`
WHERE (SELECT COUNTIF(REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$')) *
          COUNTIF(REGEXP_CONTAINS(hit.page.pagePath, '/payment/$'))
        FROM UNNEST(hits) AS hit WHERE hit.type = 'PAGE') > 0
GROUP BY DATE
ORDER BY DATE ASC, Sessions ASC;

答案 2 :(得分:0)

尝试这样的假设一个正则表达式

 WHERE hit.type = 'PAGE' AND REGEXP_CONTAINS(hit.page.pagePath, '(/confirm/$)|(/payment/$)') 

答案 3 :(得分:0)

我在GA数据集中测试了此查询,它可能适合您:

#standardSQL
SELECT
date,
COUNT(DISTINCT CONCAT(fv, CAST(v AS string))) sessions
FROM(
  SELECT 
  date,
  fullvisitorid fv,
  visitid v,
  CASE WHEN (MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$') THEN TRUE END) AND MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/payment/$') THEN TRUE END)) THEN TRUE END flag
  FROM `dafiti-analytics.40663402.ga_sessions_20170102`,
  UNNEST(hits) hit
  WHERE 1 = 1
  AND hit.type = 'PAGE' AND REGEXP_CONTAINS(hit.page.pagePath, r'/confirm/$|/payment/$')
  GROUP BY fv, v, date
  HAVING flag IS NOT NULL
)
GROUP BY 
    date

首先,我选择了hits.page.pagepath字段中包含确认付款的所有用户及其会话。

之后,我使用了MAX操作,并按访问者及其会话分组,以查找confirmpayment的发生时间,如下所示:

 CASE WHEN (MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/confirm/$') THEN TRUE END) AND MAX(CASE WHEN REGEXP_CONTAINS(hit.page.pagePath, '/payment/$') THEN TRUE END)) THEN TRUE END flag
当给定会话中的特定访问者在其导航中同时拥有flagconfirm时,

payment为真。

然后只是COUNT DISTINCT访问者及其会话的串联以获得总会话数(这是因为visitid在访问者中并不是唯一的。)