Scoped聚合和WHERE范围

时间:2016-09-08 14:25:39

标签: google-bigquery

以下查询运行并生成16行输出(通过包装在SELECT count(*) FROM (query)中验证)命中是重复记录。 hits.customDimensions在命中内部重复。 customDimensions在主记录中重复。

SELECT
  fullVisitorId,
  visitId,
  hits.page.pagePath,
  hits.type,
  FIRST(IF(customDimensions.index = 10, customDimensions.value, NULL)) WITHIN RECORD AS gacid,
  FIRST(IF(hits.customDimensions.index = 11, hits.customDimensions.value, NULL)) WITHIN hits AS blogCategories
FROM
  [dataset.ga_sessions_20160902]
WHERE
  fullVisitorId ='55555555555'

然而

SELECT
  fullVisitorId,
  visitId,
  hits.page.pagePath,
  hits.type,
  FIRST(IF(customDimensions.index = 10, customDimensions.value, NULL)) WITHIN RECORD AS gacid,
  FIRST(IF(hits.customDimensions.index = 11, hits.customDimensions.value, NULL)) WITHIN hits AS blogCategories
FROM
  [dataset.ga_sessions_20160902]
WHERE
  fullVisitorId ='55555555555'
  AND hits.type = 'PAGE'

失败
Cannot query the cross product of repeated fields customDimensions.index and hits.type.

是否只返回一条(unflattened)记录而且我的包裹计数没有给我真实的结果?为什么两个作用域聚合可以在不同的作用域上工作,但最内部作用域的WHERE会失败?

1 个答案:

答案 0 :(得分:1)

避免产生交叉产品尝试下面

SELECT
  fullVisitorId,
  visitId,
  hits.page.pagePath,
  hits.type,
  FIRST(IF(customDimensions.index = 10, customDimensions.value, NULL)) WITHIN RECORD AS gacid,
  FIRST(IF(hits.customDimensions.index = 11, hits.customDimensions.value, NULL)) WITHIN hits AS blogCategories
FROM [dataset.ga_sessions_20160902]
WHERE fullVisitorId ='55555555555'
HAVING hits.type = 'PAGE' 

顺便说一下,在Legacy SQL中,任何最外层的SELECT都会产生扁平化的结果(除非你用各自的选项将结果写入表中 - 结果大而且不平整) - 这解释了你的例子中的问题