BigQuery:根据目标网页重新创建GA细分-指标不一致

时间:2018-10-05 12:20:12

标签: google-bigquery

我想在BigQuery上创建一个表格,其中包含我们的主要 Google Analytics(分析)细分及其指标(会话,用户,跳出率,交易,收入,网站停留时间),按尺寸(设备,新闻/返回)细分。

以下查询工作正常,并且可以在Google Analytics(分析)中找到结果:

SELECT
  SUBSTR(date,1,6) AS date,
  CASE
    WHEN REGEXP_CONTAINS(trafficSource.medium, r'^(cpc|ppc|cpa|cpm|cpv|cpp|display)') THEN 'paid'
    WHEN trafficSource.medium LIKE 'organic' THEN 'organic'
    WHEN trafficSource.medium LIKE '(none)' THEN 'direct'
    WHEN trafficSource.medium LIKE 'referral' THEN 'referral'
    WHEN trafficSource.medium LIKE 'social' THEN 'social'
    ELSE 'other'
  END AS segment,
  COALESCE(totals.newVisits, 0) = 1 AS first_visit,
  device.deviceCategory AS device,
  COUNT(DISTINCT fullVisitorId) AS users,
  SUM(totals.visits) AS sessions,
  SUM(totals.pageviews) AS pageviews,
  COUNT(totals.bounces) AS bounces,
  SUM(totals.timeOnSite) AS session_duration,
  SUM(totals.transactions) AS transactions,
  SUM(totals.totalTransactionRevenue)/1000000 AS revenue
FROM
  `xxxxxxxxxxx.ga_sessions_20180930`
GROUP BY
  1,2,3,4

如您所见,该查询仅适用于基于 trafficSource.medium 的所有细分。我的问题是,我还需要根据着陆页创建细分。

我将原始表与一个子查询结合在一起,以获取未嵌套的着陆页(请检查下面的查询->为简单起见,删除了其他维度和指标)。

该策略不起作用:现在我所有的指标(例如会话)都不再与Go​​ogle Analytics(分析)中的指标匹配。

SELECT
  CASE
    WHEN REGEXP_CONTAINS(trafficSource.medium,  
r'^(cpc|ppc|cpa|cpm|cpv|cpp|display)') THEN 'paid'
    WHEN trafficSource.medium LIKE 'organic' AND landing_blog LIKE 'not blog' THEN 'organic not blog'
    WHEN trafficSource.medium LIKE 'organic' AND landing_blog LIKE 'blog' THEN 'organic landing blog'
    WHEN trafficSource.medium LIKE '(none)' THEN 'direct'
    WHEN trafficSource.medium LIKE 'referral' THEN 'referral'
    WHEN trafficSource.medium LIKE 'social' THEN 'social'
    ELSE 'other'
  END AS segment,
  SUM(totals.visits) AS sessions
FROM
  `xxxxxxxxxxxx.ga_sessions_20180930` AS ga
JOIN (
  SELECT
    visitId,
    CASE
      WHEN REGEXP_CONTAINS(hits.page.pagePath,  r'^/blog/') THEN 'blog'
      ELSE 'not blog'
    END AS landing_blog
    -- Uses the knowledge that hits are stored in chronological order
  FROM
    `xxxxxxxxxxxx.ga_sessions_20180930`,
    UNNEST (hits) AS hits
  WHERE
    hits.hitNumber = 1
    AND hits.type LIKE 'PAGE') AS landing
ON
  ga.visitId = landing.visitId
GROUP BY 1

在纠正第二个查询或提出不同的策略方面,我将不胜感激。

2 个答案:

答案 0 :(得分:1)

下班的人帮助我找到了错误。这是最终的查询:

SELECT
  SUBSTR(date,1,6) AS date,
  CASE
    WHEN REGEXP_CONTAINS(trafficSource.medium, r'^(cpc|ppc|cpa|cpm|cpv|cpp|display)') THEN 'paid'
    WHEN trafficSource.medium LIKE 'organic'
  AND landing_blog LIKE 'not blog' THEN 'organic not blog'
    WHEN trafficSource.medium LIKE 'organic' AND landing_blog LIKE 'blog' THEN 'organic landing blog'
    WHEN trafficSource.medium LIKE '(none)' THEN 'direct'
    WHEN trafficSource.medium LIKE 'referral' THEN 'referral'
    WHEN trafficSource.medium LIKE 'social' THEN 'social'
    ELSE 'other'
  END AS segment,
  COALESCE(totals.newVisits, 0) = 1 AS first_visit,
  device.deviceCategory AS device,
  COUNT(DISTINCT ga.fullVisitorId) AS users,
  SUM(totals.visits) AS sessions,
  SUM(totals.pageviews) AS pageviews,
  COUNT(totals.bounces) AS bounces,
  SUM(totals.timeOnSite) AS session_duration,
  SUM(totals.transactions) AS transactions,
  SUM(totals.totalTransactionRevenue)/1000000 AS revenue
FROM
  `xxxx.ga_sessions_2018*` AS ga
LEFT JOIN (
  SELECT
    visitId,
    fullVisitorId,
    CASE
      WHEN REGEXP_CONTAINS(hits.page.pagePath, r'^/blog/|^/lab/') THEN 'blog'
      ELSE 'not blog'
    END AS landing_blog
  FROM
    `xxxxx.ga_sessions_2018*`,
    UNNEST (hits) AS hits
  WHERE
    hits.isEntrance = TRUE) AS landing
ON
  ga.visitId = landing.visitId
  AND ga.fullVisitorId = landing.fullVisitorId
GROUP BY
  1, 2, 3, 4

答案 1 :(得分:0)

我了解您在做什么,可以告诉您在采用该解决方案时可能会遇到问题。

现在,您正在做的是根据细分条件过滤匹配。由于会话数据可以在匹配中持久保存,因此您不会有任何问题,其行为就像在GA界面中基于会话的细分一样。

但是,如果您要对事件进行同样的处理,或者尝试根据用户(例如首次访问的来源/媒介)进行GA细分,那么您的解决方案将不可行。当您在网站中使用utm参数进行内部链接时,也会出现这种情况(您只会看到会话的第一部分,这会影响指标和比率)。

我建议您执行以下操作:

  1. 创建一个用于定义细分的视图,并仅返回属于该细分的所有会话ID或用户ID(如果要基于用户使用GA的细分)
  2. 创建指标/性能查询,并将其添加到dynamic script = CSScript.Evaluator .LoadMethod(@"void SayHello(string greeting) { Console.WriteLine(greeting); }"); script.SayHello("Hello World!"); 子句中,该会话ID或用户ID必须在视图中出现。或者,您可以使用JOIN和case语句来实现这一点。

通过这种方式,您可以进行非常可重用的模块化查询,其中在一个位置定义了过滤器,而在另一位置定义了度量。如果要添加更多细分,这很容易。如果要更改一个细分,这也很容易