合并DML扫描的BigQuery限制行

时间:2018-07-10 13:49:35

标签: google-cloud-platform google-bigquery dml

鉴于下面的DML语句,有没有一种方法可以限制目标表扫描的行数?例如,假设我们有一个shard_id字段,该表用于分区。我事先知道所有更新都应在shard_id的某个范围内进行。有没有一种方法可以指定target的where子句来限制需要扫描的行数,因此update不必执行全表扫描来查找id?

MERGE dataset.table_target target
USING dataset.table_source source
ON target.id = "123"
WHEN MATCHED THEN
UPDATE SET some_value = source.some_value
WHEN NOT MATCHED BY SOURCE AND id = "123" THEN
DELETE

2 个答案:

答案 0 :(得分:2)

ON条件是需要在其中编写子句的Where语句。

ON target.id = "123" AND DATE(t.shard_id) BETWEEN date1 and date2

答案 1 :(得分:0)

对于您而言,按ON条件执行分区修剪是不正确的。相反,您应该在WHEN子句中执行此操作。

https://cloud.google.com/bigquery/docs/using-dml-with-partitioned-tables#pruning_partitions_when_using_a_merge_statement上有一个针对这种情况的示例。

基本上,ON条件用作匹配条件,以在MERGE中联接目标表和源表。以下两个查询显示了连接条件和where子句之间的区别,

查询1:

with
t1 as (
  select '2018-01-01' pt, 10 v1 union all
  select '2018-01-01', 20 union all
  select '2000-01-01', 10),
t2 as (select 10 v2)
select * from t1 left outer join t2 on v1=v2 and pt = '2018-01-01'

结果:

pt          v1  v2
2018-01-01  10  10
2018-01-01  20  NULL
2000-01-01  10  NULL

查询2:

with
t1 as (
  select '2018-01-01' pt, 10 v1 union all
  select '2018-01-01', 20 union all
  select '2000-01-01', 10),
t2 as (select 10 v2)
select * from t1 left outer join t2 on v1=v2 where pt = '2018-01-01'

结果:

pt          v1  v2
2018-01-01  10  10
2018-01-01  20  NULL