我有一个查询使用分析函数为一天分区表。我希望它只读取在where子句中过滤的分区中的数据,但它会读取表中的所有分区。
WITH query AS (
SELECT
* EXCEPT(rank)
FROM (
SELECT
*,
RANK() OVER (PARTITION BY day order by num_mean_temp_samples) AS rank
FROM (
SELECT
FORMAT_DATE("%Y%m%d", _PARTITIONDATE) AS day,
*
FROM
`mydataset.gsod_partitioned` ) q_nested
) q
WHERE
rank < 1000
)
SELECT
num_mean_temp_samples ,
count(1) as samples
FROM query
WHERE
day in ( '20100101', '20100103')
GROUP BY 1 ORDER BY 1
我验证了分区修剪没有分析功能:
WITH query AS (
SELECT
FORMAT_DATE("%Y%m%d", _PARTITIONDATE) AS day,
*
FROM
`mydataset.gsod_partitioned`
)
或添加UNION ALL后嵌套选择:
WITH query AS (
SELECT
* EXCEPT(rank)
FROM (
SELECT
*,
RANK() OVER (PARTITION BY day order by num_mean_temp_samples) AS rank
FROM (
SELECT
FORMAT_DATE("%Y%m%d", _PARTITIONDATE) AS day,
*
FROM
`mydataset.gsod_partitioned` WHERE _PARTITIONDATE < "1970-01-01" ) q_nested1
UNION ALL SELECT
*,
RANK() OVER (PARTITION BY day order by num_mean_temp_samples) AS rank
FROM (
SELECT
FORMAT_DATE("%Y%m%d", _PARTITIONDATE) AS day,
*
FROM
`mydataset.gsod_partitioned` WHERE _PARTITIONDATE >= "1970-01-01" ) q_nested2
) q
WHERE
rank < 1000
)
表mydataset.gsod_partitioned是基于公共数据集的gsod,其中day = 20100101分区创建如下:
bq query --destination_table 'private.gsod_partitioned$20100101' --time_partitioning_type=DAY --use_legacy_sql=false
'SELECT station_number, mean_temp, num_mean_temp_samples FROM `bigquery-public-data.samples.gsod` where year=2010 and month=01 and day=01'
您是否可以找到一种方法来为分析函数启用分区修剪,而无需在查询中添加额外的联合?
答案 0 :(得分:1)
关于_PARTITIONDATE - 它没有记录功能,建议使用_PARTITIONETIME,你可以寻找其他一些问题,看看Google员工之一:Use of the _PARTITIONDATE vs. the _PARTITIONTIME pseudo-columns in BigQuery
关于使用analitycal函数进行分区修剪去年,Google添加了对过滤器下推的支持,但仅适用于_PARTITIONTIME ,它应包含在PARTITON BY子句所涵盖的字段中< / p>
它应该是这样的:
WITH query AS (
SELECT
* EXCEPT(rank)
FROM (
SELECT
*,
RANK() OVER (PARTITION BY _pt order by num_mean_temp_samples) AS rank
FROM (
SELECT
FORMAT_TIMESTAMP("%Y%m%d", _PARTITIONTIME) AS day,
_PARTITIONTIME as _pt,
*
FROM
`mydataset.gsod_partitioned` ) q_nested
) q
WHERE
rank < 1000
)
SELECT
num_mean_temp_samples ,
count(1) as samples
FROM query
WHERE
day in ( '20100101', '20100103')
GROUP BY 1 ORDER BY 1