加入登录页面查询会使每个来源的会话加倍

时间:2016-11-21 17:31:25

标签: google-analytics google-bigquery

我尝试从Google Analytics数据的大查询表中查询每个来源的访问次数,但需要在着陆页级别过滤掉一些会话。因此,我通过登录页面预先查询visitID并重新加入会话数据,如下所示:

#StandardSQL
WITH landingpages AS (
  SELECT
    visitID,
    h.page.pagePath AS LandingPage
  FROM
    `project.dataset.ga_sessions_*`, UNNEST(hits) AS h
  WHERE 
    hitNumber = 1
  AND
    _TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
  # filters to be added here
)

SELECT
  sessions.trafficSource.source,
  SUM(sessions.totals.visits) AS visits
FROM `project.dataset.ga_sessions_*` AS sessions

JOIN 
  landingpages
ON
  landingpages.visitID = sessions.visitID
WHERE
  _TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
GROUP BY
  trafficSource.source
ORDER BY
  visits DESC

这大致是GA报告的每个来源的会话数量的两倍。

有谁可以指出我做错了什么? (我怀疑这是非常明显的)

我已经尝试检查第一个查询的数据输出,除了很小比例的重复访问ID之外,它无法找到任何错误。我现在也尝试了各种不同类型的JOIN,现在都可以使用。

1 个答案:

答案 0 :(得分:1)

从GBQ查询ga数据时,必须知道并记住,fullVisitorID和visitID都表示唯一访问。只有两者的双连接才会返回有意义的数据集。

这是我应该写的:

#StandardSQL
WITH landingpages AS (
  SELECT
    fullVisitorId,
    visitID,
    h.page.pagePath AS LandingPage
  FROM
    `project.dataset.ga_sessions_*`, UNNEST(hits) AS h
  WHERE 
    hitNumber = 1
  AND
    _TABLE_SUFFIX BETWEEN '20150926' AND '20150926'

), 
session_data AS (
   SELECT
      date AS ga_date, trafficSource.source AS source, fullVisitorId, visitID, SUM(totals.visits) AS visits
    FROM
      `project.dataset.ga_sessions_*`
    WHERE
      _TABLE_SUFFIX BETWEEN '20150926' AND '20150926'
    AND
      totals.visits > 0     
    GROUP BY ga_date, source, fullVisitorId, visitID
)

SELECT 
  ga_date, source, SUM(visits) AS Sessions
FROM 
  landingpages 
JOIN 
  session_data
ON 
  landingpages.VisitID = session_data.VisitID 
AND 
  landingpages.fullVisitorId = session_data.fullVisitorId
GROUP BY 
  ga_date, source
ORDER BY
  Sessions DESC