使用BigQuery提取命中级别数据时,Google Analytics指标会夸大

时间:2017-05-09 22:18:11

标签: google-analytics google-bigquery

我尝试在我已链接到bigquery的Google Analytics易拉语属性中显示源属性名称。问题是,当我尝试下面的一些指标变得非常膨胀。我猜测这与重复的字段有关,但不知道如何处理它。我尝试了很多解决方法,例如使用" max"但是这并没有显示每个属性名称。

除了用户和访问之外的所有指标似乎都被夸大了。

SELECT
  date,
  MAX(CASE
      WHEN EXISTS(  SELECT 1  FROM UNNEST(hits) hits  WHERE REGEXP_CONTAINS(hits.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640')) THEN 'MUG'
      WHEN EXISTS (
    SELECT
      1
    FROM
      UNNEST(hits) hits
    WHERE
      hits.sourcePropertyInfo.sourcePropertyTrackingId = 'Social') THEN 'Social'ELSE 'Website' END) AS Property,
  geoNetwork.country AS Country,
 COUNT(DISTINCT CONCAT(cast(visitId AS STRING),fullVisitorId)) as visits,
 sum(totals.visits) as visits2,
  COUNT(DISTINCT(fullVisitorId)) AS Users,
 h.sourcePropertyInfo.sourcePropertyDisplayName as display,
  SUM((
    SELECT
      SUM(latencyTracking.pageLoadTime)
    FROM
      UNNEST(hits)
    WHERE
      page.pagePath = '/' ))/SUM((
    SELECT
      SUM(latencyTracking.pageLoadSample)
    FROM
      UNNEST(hits)
    WHERE
      page.pagePath = '/')) AS pageloadspeed,
  SUM(totals.newVisits) AS new_,
  SUM(totals.screenviews) AS PAGEVIEWS,
  SUM(totals.bounces) AS BOUNCES,
   sum(CASE
      WHEN device.isMobile = TRUE THEN (totals.visits)
      ELSE 0 END) mobilevisits,
  SUM(CASE
      WHEN trafficSource.medium = 'organic' THEN (totals.visits)
      ELSE 0 END) organicvisits,
  SUM(CASE
      WHEN EXISTS(  SELECT 1  FROM UNNEST(hits) hits  WHERE REGEXP_CONTAINS(hits.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro')) THEN 1
      ELSE 0 END) AS NewRegistrations,
  SUM(CASE
      WHEN EXISTS(  SELECT 1  FROM UNNEST(hits) hits  WHERE REGEXP_CONTAINS(hits.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::')) THEN 1
      ELSE 0 END) AS ClickToBuy,
  SUM(totals.transactions) AS Transactions
FROM
  `project.dataset.ga_sessions_*`, UNNEST(hits) as h
WHERE
  1 = 1
  AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01')
  AND TIMESTAMP('2017-05-01')
GROUP BY
  date,
  Country,
  display
ORDER BY
  visits DESC;

编辑:

我试过简单地从FROM子句中删除UNNEST(HITS)命中,这给了我以下错误:

  

错误:无法访问类型为ARRAY>的值的字段sourcePropertyInfo在[16:14]

我还尝试在子查询中使用它,如下所示:

(select h.sourcePropertyInfo.sourcePropertyDisplayName from unnest(hits) h) as displayname, 

并收到错误:

  

标量子查询产生了多个元素

4 个答案:

答案 0 :(得分:1)

你在最外面的FROM语句中展平你的表格(即这里:

FROM   project.dataset.ga_sessions_* ,UNNEST(点击)h

所有会话级别维度,例如device。*或totals。* totals.transactions等值已经累积到会话级别,因此当您通过取消匹配来展平表格时,这些总计值会被写入多次有点击。例: 让我们说一次会话中有30次点击和2次交易,因为你压扁/取消你的点击,你将留下包含totals.transactions = 2的30行,所以当你总结它们时,结果将是本次会议共有60笔交易。 您的用户和访问不会因此而受到影响,因为您会将其区分开来,因此任何欺骗都会被淘汰。

如果您删除或修改此行

,只要删除,UNNEST(匹配)为,我就会认为您的查询是否有效

h.sourcePropertyInfo.sourcePropertyDisplayName as display

因为除了这一特定行之外,你已经在select语句中删除了所需的命中。

答案 1 :(得分:1)

由于您需要在命中级别计算多个值,因此可能需要删除字段 hits 是最佳方法。缺点是您丢失了会话级别的总计字段聚合,但仍然可以解决它。

举个例子:

SELECT
  date,
  CASE
    WHEN REGEXP_CONTAINS(h.sourcePropertyInfo.sourcePropertyTrackingId, r'82272640') THEN 'MUG'
    WHEN h.sourcePropertyInfo.sourcePropertyTrackingId = 'Social' THEN 'Social'ELSE 'Website'
  END AS Property,
  geoNetwork.country AS Country,
  COUNT(DISTINCT CONCAT(CAST(visitId AS STRING),fullVisitorId)) AS visits,
  COUNT(DISTINCT(fullVisitorId)) AS Users,
  h.sourcePropertyInfo.sourcePropertyDisplayName AS display,
  SUM(CASE
      WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadTime END) / SUM(CASE
      WHEN REGEXP_CONTAINS(h.page.pagepath, r'/') THEN h.latencyTracking.pageLoadSample END) AS pageloadspeed,
  COUNT(DISTINCT
    CASE
      WHEN totals.newVisits = 1 THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END) new_visits,
  COUNT(CASE
         WHEN h.type = 'PAGE' THEN h.page.pagepath END) pageviews,
  SUM(CASE
       WHEN (h.isentrance = TRUE AND h.isexit = TRUE) THEN 1 END) bounces,
  COUNT(DISTINCT (CASE
        WHEN device.isMobile = TRUE THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) mobilevisits,
  COUNT(DISTINCT (CASE
        WHEN trafficSource.medium = 'organic' THEN CONCAT(CAST(visitId AS STRING),fullVisitorId) END)) organicvisits,
  SUM(CASE
       WHEN REGEXP_CONTAINS(h.eventInfo.eventAction,'register$|registersuccess|new registration|account signup|registro') THEN 1 END) AS NewRegistrations,
  SUM(CASE
       WHEN REGEXP_CONTAINS(h.eventInfo.eventAction, 'add to cart|add to bag|click to buy|ass to basket|comprar|addtobasket::') THEN 1 END) AS ClickToBuy,
  COUNT(h.transaction.transactionid) transactions
FROM
  `project_id.dataset_id.ga_sessions_*`,
  UNNEST(hits) AS h
WHERE
  1 = 1
  AND PARSE_TIMESTAMP('%Y%m%d', REGEXP_EXTRACT(_table_suffix, r'.*_(.*)')) BETWEEN TIMESTAMP('2017-05-01') AND TIMESTAMP('2017-05-01')
GROUP BY
  date,
  Country,
  display,
  Property

我针对我们的数据集运行它,它似乎正在工作。我做了一些改变:

  • 删除了属性MAX操作,并将其添加到组中。
  • 网页浏览量被视为hit.type ='PAGE'的点击次数。不确定屏幕视图是否相同。
  • 弹跳是在有进入和退出事件时计算的。
  • 总交易是对交易ID的计算(希望此字段也在您的数据集中填充)。

答案 2 :(得分:0)

UNNESTing数组后,使用MAX()代替将它们汇总到报告会话级矩阵:

使用:

  
      
  • MAX(totals.screenviews)AS PAGEVIEWS,
  •   
  • MAX(totals.bounces)AS BOUNCES,
  •   
  • MAX(totals.transactions)AS Transactions
  •   
  • ...
  •   
  • ...
  •   

而不是:

  
      
  • SUM(totals.screenviews)AS PAGEVIEWS,
  •   
  • SUM(totals.bounces)AS BOUNCES,
  •   
  • SUM(totals.transactions)AS Transactions
  •   

这应该可以部分解决您的问题。让我知道它是怎么回事?

答案 3 :(得分:0)

我认为在William Fuks的查询中计算出的跳出率要高得多的原因如下

  

(h.isentrance = TRUE和h.isexit = TRUE)然后1 END)弹起时

似乎isEntrance和isExit仅在PAGE命中时发生,因此不考虑事件。因此,跳出次数过多是由于单页浏览可能导致页面上发生了一个或多个交互事件。