在BigQuery

时间:2018-02-14 14:25:07

标签: sql google-analytics google-bigquery

采用https://webmasters.stackexchange.com/a/87523

上所述的内容

除了我自己的理解,我想出了我认为会被认为是“回归用户”的内容

1.首先查询显示在两年时间内第一次“最近访问”的用户:

SELECT
  parsedDate,
  CASE
  # return fullVisitorId when the first latest visit is between 2 years and today
    WHEN parsedDate BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 2 YEAR) AND CURRENT_DATE() THEN fullVisitorId
  END fullVisitorId
FROM (
  SELECT
    # convert the date field from string to date and get the latest date
    PARSE_DATE('%Y%m%d',
      MAX(date)) parsedDate,
    fullVisitorId
  FROM
    `project.dataset.ga_sessions_*`
  WHERE
    # only show fullVisitorId if first visit
    totals.newVisits = 1
  GROUP BY
    fullVisitorId)

2.然后单独查询以选择特定日期范围内的某些字段:

SELECT
  PARSE_DATE('%Y%m%d',
    date) parsedDate,
  fullVisitorId,
  visitId,
  totals.newVisits,
  totals.visits,
  totals.bounces,
  device.deviceCategory
FROM
  `project.dataset.ga_sessions_*`
WHERE
  _TABLE_SUFFIX = "20180118"

3.将这两个查询连在一起找“返回用户”

SELECT
q1.parsedDate date,
COUNT(DISTINCT q1.fullVisitorId) users,
# Default way to determine New Users
SUM(q1.newVisits) newVisits,
# Number of "New Users" based on my queries (matches with default way above)
COUNT(DISTINCT IF(q2.parsedDate < q1.parsedDate, NULL, q2.fullVisitorId)) newUsers,
# Number of "Returning Users" based on my queries
COUNT(DISTINCT IF(q2.parsedDate < q1.parsedDate, q2.fullVisitorId, NULL)) returningUsers
FROM (
(SELECT
  PARSE_DATE('%Y%m%d',
    date) parsedDate,
  fullVisitorId,
  visitId,
  totals.newVisits,
  totals.visits,
  totals.bounces,
  device.deviceCategory
FROM
  `project.dataset.ga_sessions_*`
WHERE
  _TABLE_SUFFIX = "20180118") q1
LEFT JOIN (
SELECT
  parsedDate,
  CASE
  # return fullVisitorId when the first latest visit is between 2 years and today
    WHEN parsedDate BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 2 YEAR) AND CURRENT_DATE() THEN fullVisitorId
  END fullVisitorId
FROM (
  SELECT
    # convert the date field from string to date and get the latest date
    PARSE_DATE('%Y%m%d',
      MAX(date)) parsedDate,
    fullVisitorId
  FROM
    `project.dataset.ga_sessions_*`
  WHERE
    # only show fullVisitorId if first visit
    totals.newVisits = 1
  GROUP BY
    fullVisitorId)) q2
ON q1.fullVisitorId = q2.fullVisitorId)
GROUP BY
date

结果BQ

BigQuery Results

按用户分组的未抽样新用户/回访者报告GA中的同一时段

new/returning visitors split by Users report GA

问题/问题:

  1. 鉴于newVisits(默认字段)和newUsers(我的计算)给出了与GA报告新访问者用户一致的相同结果。为什么GAs返回访客用户与我在BQ中returningUsers的计算不匹配?这两个甚至可以比较,我错过了什么?

  2. 我的方法是最有效,更简洁的方法吗?

  3. 有没有更好的方法来获取数据,我缺少的东西?

  4. 根据Martin的回答,我设法在我运行的查询的上下文中创建“返回用户”指标/字段:

    SELECT
      date,
      deviceCategory,
      # newUsers - SUM result if it's a new user
      SUM(IF(userType="New Visitor", 1, 0)) newUsers,
      # returningUsers - COUNT DISTINCT fullvisitorId if it's a returning user
      COUNT(DISTINCT IF(userType="Returning Visitor", fullvisitorid, NULL)) returningUsers,
      COUNT(DISTINCT fullvisitorid) users,
      SUM(visits) sessions
    FROM (
      SELECT
        date,
        fullVisitorId,
        visitId,
        totals.visits,
        device.deviceCategory,
        IF(totals.newVisits IS NOT NULL, "New Visitor", "Returning Visitor") userType
      FROM
        `project.dataset.ga_sessions_20180118` )
    GROUP BY
      deviceCategory,
      date
    

1 个答案:

答案 0 :(得分:0)

Google Analytics(分析)使用用户近似值(fullvisitorid) - 即使它基于100%&#34;表示&#34;使用非抽样报告时,您可以获得更好的用户数。

另外需要提及的是:即使totals.visits != 1,也会考虑使用fullvisitorids,而会话仅计入totals.visits = 1

如果用户在新的位置再返回,则会对用户进行重复计算。这意味着,这应该给你正确的数字:

SELECT
  totals.newVisits IS NOT NULL AS isNew,
  COUNT(DISTINCT fullvisitorid) AS visitors,
  SUM(totals.visits) AS sessions
FROM
  `project.dataset.ga_sessions_20180214`
GROUP BY
  1

如果你想避免重复计算,你可以使用这个,即使她回来,用户也算作新的:

WITH
  visitors AS (
  SELECT
    fullvisitorid,
    -- check if any visit of this visitor was new - will be used for grouping later
    MAX(totals.newVisits ) isNew, 
    SUM(totals.visits) as sessions
  FROM
    `project.dataset.ga_sessions_20180214`
  GROUP BY 1
  )

SELECT
  isNew IS NOT NULL AS isNew,
  COUNT(1) AS visitors,
  sum(sessions) as sessions
FROM
  visitors
GROUP BY 1

当然,这些数字仅与总数相匹配。