使用 StackOverflow 公共数据集我在一个完整的表上运行我的查询,它花了 1 分 29 秒(见附图)
SELECT * from `bigquery-public-data.stackoverflow.stackoverflow_posts`
WHERE creation_date between "2011-01-01 00:00:00 UTC" and "2011-03-31 23:59:59 UTC"
我使用 creation_time 对数据集进行分区
CREATE TABLE `ml-demo-304017.stackoverflow.questions_partitioned`
PARTITION BY Date(creation_date) AS
(SELECT * FROM `bigquery-public-data.stackoverflow.stackoverflow_posts`)
分区后,我尝试在分区表上运行相同的查询(见下图),耗时 1 分 41 秒
#get data between 01-01-2017 and 31-03-2017 using the partitioned table. (Check how much data will be processed)
SELECT * from `ml-demo-304017.stackoverflow.questions_partitioned`
WHERE creation_date between "2011-01-01 00:00:00 UTC" and "2011-03-31 23:59:59 UTC"
谁能解释为什么即使第二次处理的数据(638.5GB)与完整数据(29.4GB)相比如此小,为什么会发生这种情况?