我从公共文档中可以看到BigQuery分区表具有此limitation,如果分区列中有一个子查询作为过滤器,它将不会修剪查询的分区并减少“已处理的字节数”(成本)。我想知道是否有解决方法。
例如,此查询将扫描38.67 GB,是否可以减少它?
WITH sub_query_that_generates_filter AS (
SELECT DATE "2016-10-01" as month UNION ALL
SELECT "2017-10-01" UNION ALL
SELECT "2018-10-01"
)
SELECT block_hash, fee FROM `bigquery-public-data.crypto_bitcoin.transactions`
WHERE block_timestamp_month in
(SELECT month FROM sub_query_that_generates_filter)
答案 0 :(得分:6)
使用BigQuery scripting(现在是Beta版),可以降低成本。
基本上,已定义脚本变量以捕获子查询的动态部分。然后在随后的查询中,脚本变量用作过滤器以修剪要扫描的分区。
CREATE TEMP TABLE sub_query_that_generates_filter AS (
SELECT DATE "2017-10-01" as month UNION ALL
SELECT "2018-10-01" UNION ALL
SELECT "2016-10-01"
);
BEGIN
DECLARE month_filter ARRAY<DATE>
DEFAULT (SELECT ARRAY_AGG(month) FROM sub_query_that_generates_filter);
SELECT block_hash, fee FROM `bigquery-public-data.crypto_bitcoin.transactions`
WHERE block_timestamp_month in UNNEST(month_filter);
END
它仅扫描2GB数据,而不扫描38GB。便宜又快捷!