Question

我有一个日期分区表（称之为sample_table），有2列，一个用于保存UTC的日期时间，另一个用于保存时区偏移量。我在这个表的顶部有一个视图（称之为sample_view）。该视图从表中获取_partitiontime并将其公开为partitionDate列，还有另一列customerDateTime，它只是将dateTime与timeZoneOffset相加。

当我仅使用sample_table bigquery扫描直接查询_partitiontime时，扫描的数据少得多（131 MB）。

select
  containerName,
  count(*)
from
  [sample_project.sample_table] 
where
  _partitiontime between timestamp('2016-12-12') and timestamp('2016-12-19')
  and customer = 'X'
  and containerName = 'XXX'
group by containerName
;

但是当我在dateTime列的表上运行相同的查询以根据客户的本地日期时间进行扫描时，大查询会扫描更多（211MB）。我预计不到131MB或相当于131MB。

select
  containerName,
  count(*)
from
  [sample_project.sample_table] 
where
  _partitiontime between timestamp('2016-12-12') and timestamp('2016-12-19')
  and DATE_ADD(dateTime, 3600, 'SECOND' ) between timestamp('2016-12-12 08:00:00') and timestamp('2016-12-19 15:00:00')
  and customer = 'X'
  and containerName = 'XXX'
group by containerName
;

当我针对sample_view进行类似的查询，partitionDate bigquery扫描更多（399MB）

select
  containerName,
  count(*)
from
  [sample_project.sample_view] 
where
  partitionDate between timestamp('2016-12-12') and timestamp('2016-12-19')
  and customer = 'X'
  and containerName = 'XXX'
group by containerName
;

当我使用partitionDate对视图运行查询并使用customerDateTime列以及bigquery扫描更多（879MB）

select
  containerName,
  count(*)
from
  [sample_project.sample_view] 
where
  partitionDate between timestamp('2016-12-12') and timestamp('2016-12-19') and customerDateTime between timestamp('2016-12-12 08:00:00') and timestamp('2016-12-19 15:00:00')
  and customer = 'X'
  and containerName = 'XXX'
group by containerName
;

我不太确定我是否正在从上述任何查询中扫描正确的分区。为什么我会看到这些查询之间的差异？将_partitiontime作为新列partitionDate暴露出来是一个糟糕的策略吗？我不确定如何在不编写更多查询的情况下在Tableau中使用分区日期。如果您需要更多详细信息，请与我们联系。

Answer 1

您可能需要使用standard SQL代替查询，因为旧版SQL在过滤器下推方面存在一些限制。我自己并不熟悉Tableau，但他们有一个help page for BigQuery，讨论在遗留SQL和标准SQL之间切换。

Answer 2

猜猜 - 你看到的问题是因为你有重复的字段。 Legacy和Standard SQL与flattening结果的处理方式不同。传统SQL会使结果变平，因此您不会看到原始记录的数量，而是会看到重复值的数量。而标准SQL保留原始结构。在Legacy SQL中，你需要特别注意消除Flattening的影响，而在标准SQL中它已经被处理了

为什么Google BigQuery在使用视图时没有正确使用分区日期

2 个答案: