在我的GA BigQuery导出中计算唯一身份用户时遇到问题。我使用示例数据重现了相同的错误。
SELECT sum(users) as users, sum(sessions) as sessions FROM (
SELECT
h.page.pagePath as page_path,
trafficSource.source,
trafficSource.medium,
COUNT(DISTINCT(fullVisitorId)) AS users,
COUNT(*) as sessions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
WHERE h.page.pagePath = "/home"
GROUP BY page_path, source, medium
)
UNION ALL
SELECT sum(users) as users, sum(sessions) as sessions FROM (
SELECT
h.page.pagePath as page_path,
COUNT(DISTINCT(fullVisitorId)) AS users,
COUNT(*) as sessions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
WHERE h.page.pagePath = "/home"
GROUP BY page_path
)
当我包含source
和medium
列时,与众不同的fullVisitorId
计数比没有它们的计数高10。包括这些列如何导致fullVisitorId
数量增加?这对我来说没有意义。
是什么原因造成的,我如何获得准确的计数?
答案 0 :(得分:1)
包括这些列如何导致fullVisitorId数量增加?这对我来说没有意义。
您可以了解为什么要像这样运行内部查询:
SELECT
MAX(fullVisitorId) AS fullVisitorId,
h.page.pagePath as page_path,
trafficSource.source,
trafficSource.medium,
COUNT(DISTINCT(TRIM(fullVisitorId))) AS users,
COUNT(*) as sessions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
WHERE h.page.pagePath = "/home"
and fullVisitorId = '9902321252073939460'
GROUP BY page_path, source, medium
返回以下结果:
您可以看到,由于用户来自 2个不同的来源/媒介,因此您对同一位用户进行了两次计数,从而导致用户增加。
解决此问题的一种方法是在源/介质上使用聚合函数,并像这样从GROUP BY
删除它们:
SELECT sum(users) as users, sum(sessions) as sessions FROM (
SELECT
h.page.pagePath as page_path,
MAX(trafficSource.source) as source,
MAX(trafficSource.medium) as medium,
COUNT(DISTINCT(TRIM(fullVisitorId))) AS users,
COUNT(*) as sessions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
WHERE h.page.pagePath = "/home"
GROUP BY page_path
)
UNION ALL
SELECT sum(users) as users, sum(sessions) as sessions FROM (
SELECT
h.page.pagePath as page_path,
COUNT(DISTINCT(TRIM(fullVisitorId))) AS users,
COUNT(*) as sessions
FROM
`bigquery-public-data.google_analytics_sample.ga_sessions_20170101`, UNNEST(hits) h
WHERE h.page.pagePath = "/home"
GROUP BY page_path
)
现在用户数是相同的: