鉴于下面的DML语句,有没有一种方法可以限制目标表扫描的行数?例如,假设我们有一个shard_id字段,该表用于分区。我事先知道所有更新都应在shard_id的某个范围内进行。有没有一种方法可以指定target的where子句来限制需要扫描的行数,因此update不必执行全表扫描来查找id?
MERGE dataset.table_target target
USING dataset.table_source source
ON target.id = "123"
WHEN MATCHED THEN
UPDATE SET some_value = source.some_value
WHEN NOT MATCHED BY SOURCE AND id = "123" THEN
DELETE
答案 0 :(得分:2)
ON条件是需要在其中编写子句的Where语句。
ON target.id = "123" AND DATE(t.shard_id) BETWEEN date1 and date2
答案 1 :(得分:0)
对于您而言,按ON条件执行分区修剪是不正确的。相反,您应该在WHEN子句中执行此操作。
在https://cloud.google.com/bigquery/docs/using-dml-with-partitioned-tables#pruning_partitions_when_using_a_merge_statement上有一个针对这种情况的示例。
基本上,ON条件用作匹配条件,以在MERGE中联接目标表和源表。以下两个查询显示了连接条件和where子句之间的区别,
查询1:
with
t1 as (
select '2018-01-01' pt, 10 v1 union all
select '2018-01-01', 20 union all
select '2000-01-01', 10),
t2 as (select 10 v2)
select * from t1 left outer join t2 on v1=v2 and pt = '2018-01-01'
结果:
pt v1 v2
2018-01-01 10 10
2018-01-01 20 NULL
2000-01-01 10 NULL
查询2:
with
t1 as (
select '2018-01-01' pt, 10 v1 union all
select '2018-01-01', 20 union all
select '2000-01-01', 10),
t2 as (select 10 v2)
select * from t1 left outer join t2 on v1=v2 where pt = '2018-01-01'
结果:
pt v1 v2
2018-01-01 10 10
2018-01-01 20 NULL