当查询分区表时,是否可以通过带有子查询的分区列进行过滤并同时降低成本?

时间:2019-10-03 20:34:29

标签: google-bigquery

我从公共文档中可以看到BigQuery分区表具有此limitation,如果分区列中有一个子查询作为过滤器,它将不会修剪查询的分区并减少“已处理的字节数”(成本)。我想知道是否有解决方法。

例如,此查询将扫描38.67 GB,是否可以减少它?

WITH sub_query_that_generates_filter AS (
  SELECT DATE "2016-10-01" as month UNION ALL
  SELECT "2017-10-01" UNION ALL
  SELECT "2018-10-01"
)
SELECT block_hash, fee FROM `bigquery-public-data.crypto_bitcoin.transactions`
WHERE block_timestamp_month in 
(SELECT month FROM sub_query_that_generates_filter)

1 个答案:

答案 0 :(得分:6)

使用BigQuery scripting(现在是Beta版),可以降低成本。

基本上,已定义脚本变量以捕获子查询的动态部分。然后在随后的查询中,脚本变量用作过滤器以修剪要扫描的分区。

CREATE TEMP TABLE sub_query_that_generates_filter AS (
  SELECT DATE "2017-10-01" as month UNION ALL
  SELECT "2018-10-01" UNION ALL
  SELECT "2016-10-01" 
);
BEGIN
  DECLARE month_filter ARRAY<DATE> 
    DEFAULT (SELECT ARRAY_AGG(month) FROM sub_query_that_generates_filter);

  SELECT block_hash, fee FROM `bigquery-public-data.crypto_bitcoin.transactions` 
    WHERE block_timestamp_month in UNNEST(month_filter);
END

它仅扫描2GB数据,而不扫描38GB。便宜又快捷!

enter image description here