在单个查询中汇总嵌套数据和未嵌套数据的总数

时间:2018-06-15 11:57:04

标签: sql google-bigquery

我正在处理嵌套的Google Analytics数据以构建查询,我需要删除3个级别以获取我需要的所有字段,但是一旦我取消了我的.totals字段的SUM()我认为它太高了,因为它们的价值正在被重复。

如果在命中级别识别弹跳,我无法使用totals.bounces来获取此值。

我如何调整下面的查询以获得正确的退回金额和收入金额,以及未获取价值的总数?

SELECT
  customDimension.value AS UserID,
  # Visits from this individual
  SUM(totals.visits) AS visits,
  # Orders from this individual
  COUNT(DISTINCT hits.transaction.transactionId) AS orders,
  # AOV
  SAFE_DIVIDE(SUM(hits.transaction.transactionRevenue)/1000000 , COUNT(DISTINCT hits.transaction.transactionId)) AS AOV,
  #Bounces from this individual
   IFNULL(SUM(totals.bounces),
    0) AS bounces,
  IFNULL(SUM(hits.transaction.transactionRevenue)/1000000,
    0) AS revenue,
  # Conversion rate of the individual
  SAFE_DIVIDE(COUNT(DISTINCT hits.transaction.transactionId),COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING)))) AS conversion_rate,

  ROUND(IFNULL(SUM(totals.bounces)/COUNT(DISTINCT CONCAT(CAST(fullVisitorId AS STRING),CAST(visitId AS STRING))),
      0),5) AS bounce_rate,

FROM
  `MY.DATA.ga_sessions_20*` AS t
CROSS JOIN
  UNNEST (hits) AS hits
CROSS JOIN
  UNNEST(t.customdimensions) AS customDimension
CROSS JOIN
  UNNEST(hits.product) AS hits_product

  WHERE parse_date('%y%m%d', _table_suffix) between 
DATE_sub(current_date(), interval 7 day) and
DATE_sub(current_date(), interval 1 day)

  AND customDimension.index = 2
  AND customDimension.value NOT LIKE "true"
  AND customDimension.value NOT LIKE "false"
  AND customDimension.value NOT LIKE "undefined"
  AND customDimension.value IS NOT NULL
GROUP BY
  UserID,
  hits.eventInfo.eventCategory

1 个答案:

答案 0 :(得分:0)

现在,我对BigQuery有了更多的经验,我可以根据今天的实现方式回答这个问题。

使用WITH()创建多个查询,并将其包含在这些级别上所需的字段中,例如,我的第一个WITH()语句将没有任何嵌套,并将正确地汇总totals.字段。然后第二个WITH()可以UNNEST()命中级别并对我要在此处计算的字段求和。

这些查询然后可以与一个公共联接结合在一起,并且会在每个嵌套级别上显示正确的值,而不会重复。