GA BigQuery导出-COUNT(DISTINCT(fullVisitorId)),带有源/媒介超额计数

时间:2019-04-29 18:35:47

标签: google-analytics google-bigquery

在我的GA BigQuery导出中计算唯一身份用户时遇到问题。我使用示例数据重现了相同的错误。

SELECT sum(users) as users, sum(sessions) as sessions FROM (
  SELECT
    h.page.pagePath as page_path,
    trafficSource.source,
    trafficSource.medium,
    COUNT(DISTINCT(fullVisitorId)) AS users,
    COUNT(*) as sessions
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
  WHERE h.page.pagePath = "/home"
  GROUP BY page_path, source, medium
)
UNION ALL
SELECT sum(users) as users, sum(sessions) as sessions FROM (
  SELECT
    h.page.pagePath as page_path,
    COUNT(DISTINCT(fullVisitorId)) AS users,
    COUNT(*) as sessions
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
  WHERE h.page.pagePath = "/home"
  GROUP BY page_path
)

当我包含sourcemedium列时,与众不同的fullVisitorId计数比没有它们的计数高10。包括这些列如何导致fullVisitorId数量增加?这对我来说没有意义。

是什么原因造成的,我如何获得准确的计数?

1 个答案:

答案 0 :(得分:1)

  

包括这些列如何导致fullVisitorId数量增加?这对我来说没有意义。

您可以了解为什么要像这样运行内部查询:

SELECT
    MAX(fullVisitorId) AS fullVisitorId,
    h.page.pagePath as page_path,
    trafficSource.source,
    trafficSource.medium,
    COUNT(DISTINCT(TRIM(fullVisitorId))) AS users,
    COUNT(*) as sessions
  FROM
    `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
  WHERE h.page.pagePath = "/home"
  and fullVisitorId = '9902321252073939460'
  GROUP BY page_path, source, medium

返回以下结果:

enter image description here

您可以看到,由于用户来自 2个不同的来源/媒介,因此您对同一位用户进行了两次计数,从而导致用户增加。

解决此问题的一种方法是在源/介质上使用聚合函数,并像这样从GROUP BY删除它们:


    SELECT sum(users) as users, sum(sessions) as sessions FROM (
      SELECT
        h.page.pagePath as page_path,
        MAX(trafficSource.source) as source,
        MAX(trafficSource.medium) as medium,
        COUNT(DISTINCT(TRIM(fullVisitorId))) AS users,
        COUNT(*) as sessions
      FROM
        `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
      WHERE h.page.pagePath = "/home"
      GROUP BY page_path
    )
    UNION ALL
    SELECT sum(users) as users, sum(sessions) as sessions FROM (
      SELECT
        h.page.pagePath as page_path,
        COUNT(DISTINCT(TRIM(fullVisitorId))) AS users,
        COUNT(*) as sessions
      FROM
        `bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
      WHERE h.page.pagePath = "/home"
      GROUP BY page_path
    )

现在用户数是相同的:

enter image description here