让我简要介绍一下背景

Question

我在Big查询中具有以下表结构

**query_all_partition**
property_unique_date    DATE    REQUIRED    
page_url    STRING  REQUIRED    
click   INTEGER REQUIRED    
impression  INTEGER REQUIRED    
position    FLOAT   REQUIRED

在这里，我指定了对property_unique_date进行分区

**property_data**

fetch_date  DATE    REQUIRED    
property_url    STRING  REQUIRED    
property_unique_date    DATE    REQUIRED

让我简要介绍一下背景

我想捕获不同网站的google搜索分析数据（例如，特定网站的点击次数，展示次数或基于某些关键字的重定向到这些网站等）

之前我只有一个带有以下字段的表，并且分区位于fetch_date

获取日期
property_url
page_url
点击
印象
位置

因此，当我基于fetch_dates（即两个日期之间）进行查询时，查询处理仅在所需数据上进行，这可以降低成本。但是，如果我们仅存储一个网站或属性url的数据，则此方法很好。当我开始存储不同属性的数据时，甚至查询一个属性以及特定的获取日期范围，它都在处理指定日期范围内所有属性的数据，这导致大量数据处理和成本核算，因为无法对分区进行分区日期/时间戳以外的其他字段。

因此，我想到了创建两个表的方法

query_all_partition
属性数据

因此，我开始为property_url和fetch_date的组合存储日期。就像我给范围1971-01-01到1980-12-31那样存储属性P1的数据。所以，说我要存储从2018年1月开始的P1的每个数据，

fetch_date  property_url   property_unique_date
2018-01-01   P1               1971-01-01
2018-01-02   P1               1971-01-02
2018-01-01   P2               1981-01-01
2018-01-02   P2               1981-01-02

使用这种方法，我可以为每个属性存储至少10年的数据。在query_partition_all下，我开始存储property_unique_date而不是fetch_date和property_url

现在，为了测试，我为两个属性存储了1个月的数据。 P1是非常大的属性，P2是非常小的属性。分别存储两个日期为1971年7月的P1从1971-01-01至1971-01-31的财产唯一日期和1981-01-01至1981-01-31的P2的2018年7月数据。 >

运行以下查询并为其添加快照

两个属性是 -P1（大型物业）（1971-01-01至1971-01-31） -P2（小型房地产）（1981-01-01至1981-01-31）

我运行了以下查询

select page_url, sum(click) as click,sum(impression) as impression from `searchanalytics.query_all_partition` where property_unique_date BETWEEN ('1971-01-01') and ('1971-01-31') group by page_url

Image with property_unique_dates hardcoded for property P1. Please see the data being processed.

select page_url, sum(click) as click,sum(impression) as impression from `searchanalytics.query_all_partition` where property_unique_date BETWEEN ('1981-01-01') and ('1981-01-31') group by page_url

Image with property_unique_dates hardcoded for property P2. Please see the data being processed.Its small so till here everything is fine

问题来了，当我不是从硬编码属性唯一日期起，而是从子查询（从propertydata表中查询）开始获取它。请在第三张和第四张图片中查看查询和处理的数据。以下是查询。处理的数据就是完整的数据

select page_url, sum(click) as click,sum(impression) as impression from `searchanalytics.query_all_partition` where property_unique_date BETWEEN (select property_unique_date from `searchanalytics.property_data` where fetch_date='2018-01-01' and property_url='P1') and (select property_unique_date from `searchanalytics.property_data` where fetch_date='2018-01-31' and property_url='P1') group by page_url

select page_url, sum(click) as click,sum(impression) as impression from `searchanalytics.query_all_partition` where property_unique_date BETWEEN (select property_unique_date from `searchanalytics.property_data` where fetch_date='2018-01-01' and property_url='P2') and (select property_unique_date from `searchanalytics.property_data` where fetch_date='2018-01-31' and property_url='P2') group by page_url

Data for property P1 with property unique dates not hardcoded

Data for property P2 with property unique dates not hardcoded

在第3和第4中，其处理表的完整数据而不是子集。为什么会这样呢。有人可以解释以及如何解决吗？

非常感谢您的详细答复。

Answer 1

来自the documentation：

需要解析查询的多个阶段才能解析谓词的复杂查询（例如内部查询或子查询），不会从查询中删除分区。

因此，不能，您不能使用基于过滤器的子查询，并且期望它仅将扫描限制在匹配的分区上。

当SQL查询包含子查询时，分区在Google BigQuery中无法正常工作

让我简要介绍一下背景

1 个答案: