我有一个大表foos
,由foo_type
分区。以下产生了一个好的查询计划(只选择了一个分区):
select count(*) from foos where foo_type=1;
但是如果我尝试将文字“1”更改为(等效的)子查询 - 我最终会得到一个扫描每个分区的计划 -
select count(*) from foos where foo_type=(select min(foo_type) from favorite_foo_types);
如何编写一个在'where'子句中使用subselect的查询,并且不会最终扫描每个分区?
答案 0 :(得分:1)
你没有提供代码,所以没有人回答这个问题。简短的回答,动态分区消除在Greenplum中起作用,但解释与提供文字值的计划不同。
示例:
首先创建您的favorite_foo_types表。
create table public.favorite_foo_types
(id int, foo_type int)
distributed by (id);
insert into public.favorite_foo_types
values (1, 1), (2,2), (3,3), (4,4), (5,5);
analyze public.favorite_foo_types;
接下来,创建分区表。
create table public.foos
(id int, foo_type int)
distributed by (id)
partition by list (foo_type)
(
partition foo_1 values (1),
partition foo_2 values (2),
partition foo_3 values (3),
partition foo_4 values (4),
partition foo_5 values (5)
);
insert into public.foos
select i as id, case when i between 1 and 1999 then 1
when i between 2000 and 3999 then 2
when i between 4000 and 5999 then 3
when i between 6000 and 7999 then 4
when i between 8000 and 9999 then 5 end as foo_type
from generate_series(1,9999) as i;
analyze public.foos;
这是使用文字值时的计划。您可以看到它也只选择一个分区。
explain analyze
select count(*)
from public.foos
where foo_type = 1;
Aggregate (cost=0.00..431.07 rows=1 width=8)
Rows out: 1 rows with 0.722 ms to first row, 0.723 ms to end, start offset by 0.298 ms.
-> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..431.07 rows=1 width=8)
Rows out: 2 rows at destination with 0.717 ms to first row, 0.718 ms to end, start offset by 0.299 ms.
-> Aggregate (cost=0.00..431.07 rows=1 width=8)
Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 0.287 ms to end, start offset by 0.663 ms.
-> Sequence (cost=0.00..431.07 rows=1000 width=4)
Rows out: Avg 999.5 rows x 2 workers. Max 1000 rows (seg0) with 0.036 ms to first row, 0.215 ms to end, start offset by 0.663 ms.
-> Partition Selector for foos (dynamic scan id: 1) (cost=10.00..100.00 rows=50 width=4)
Filter: foo_type = 1
Partitions selected: 1 (out of 5)
Rows out: 0 rows (seg0) with 0.004 ms to end, start offset by 0.663 ms.
-> Dynamic Table Scan on foos (dynamic scan id: 1) (cost=0.00..431.07 rows=1000 width=4)
Filter: foo_type = 1
Rows out: Avg 999.5 rows x 2 workers. Max 1000 rows (seg0) with 0.032 ms to first row, 0.140 ms to end, start offset by 0.667 ms.
Partitions scanned: Avg 1.0 (out of 5) x 2 workers. Max 1 parts (seg0).
Slice statistics:
(slice0) Executor memory: 408K bytes.
(slice1) Executor memory: 195K bytes avg x 2 workers, 195K bytes max (seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: optimizer=on
Optimizer status: PQO version 1.650
Total runtime: 1.162 ms
现在,您的查询:
explain analyze
select count(*)
from public.foos
where foo_type=(select min(foo_type) from public.favorite_foo_types);
Aggregate (cost=0.00..863.04 rows=1 width=8)
Rows out: 1 rows with 6.466 ms to end, start offset by 24 ms.
-> Gather Motion 2:1 (slice3; segments: 2) (cost=0.00..863.04 rows=1 width=8)
Rows out: 2 rows at destination with 5.415 ms to first row, 6.459 ms to end, start offset by 24 ms.
-> Aggregate (cost=0.00..863.04 rows=1 width=8)
Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 4.514 ms to end, start offset by 24 ms.
-> Hash Join (cost=0.00..863.04 rows=5000 width=1)
Hash Cond: foos.foo_type = inner.min
Rows out: Avg 999.5 rows x 2 workers. Max 1000 rows (seg0) with 3.464 ms to first row, 4.441 ms to end, start offset by 24 ms.
Executor memory: 1K bytes avg, 1K bytes max (seg0).
Work_mem used: 1K bytes avg, 1K bytes max (seg0). Workfile: (0 spilling, 0 reused)
(seg0) Hash chain length 1.0 avg, 1 max, using 1 of 524341 buckets.
-> Dynamic Table Scan on foos (dynamic scan id: 1) (cost=0.00..431.10 rows=5000 width=4)
Rows out: Avg 999.5 rows x 2 workers. Max 1000 rows (seg0) with 0.382 ms to first row, 0.478 ms to end, start offset by 27 ms.
Partitions scanned: Avg 1.0 (out of 5) x 2 workers. Max 1 parts (seg0).
-> Hash (cost=100.00..100.00 rows=50 width=4)
Rows in: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 0.197 ms to end, start offset by 27 ms.
-> Partition Selector for foos (dynamic scan id: 1) (cost=10.00..100.00 rows=50 width=4)
Filter: foos.id = min
Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 0.189 ms to first row, 0.190 ms to end, start offset by 27 ms.
-> Broadcast Motion 1:2 (slice2) (cost=0.00..431.00 rows=2 width=4)
Rows out: Avg 1.0 rows x 2 workers at destination. Max 1 rows (seg0) with 0.015 ms to end, start offset by 27 ms.
-> Aggregate (cost=0.00..431.00 rows=1 width=4)
Rows out: 1 rows with 0.020 ms to end, start offset by 26 ms.
-> Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..431.00 rows=1 width=4)
Rows out: 2 rows at destination with 0.009 ms to first row, 0.010 ms to end, start offset by 26 ms.
-> Aggregate (cost=0.00..431.00 rows=1 width=4)
Rows out: Avg 1.0 rows x 2 workers. Max 1 rows (seg0) with 0.079 ms to end, start offset by 25 ms.
-> Table Scan on favorite_foo_types (cost=0.00..431.00 rows=3 width=4)
Rows out: Avg 2.5 rows x 2 workers. Max 3 rows (seg0) with 0.065 ms to first row, 0.067 ms to end, start offset by 25 ms.
Slice statistics:
(slice0) Executor memory: 414K bytes.
(slice1) Executor memory: 245K bytes avg x 2 workers, 245K bytes max (seg0).
(slice2) Executor memory: 253K bytes (entry db).
(slice3) Executor memory: 8493K bytes avg x 2 workers, 8493K bytes max (seg0). Work_mem: 1K bytes max.
Statement statistics:
Memory used: 128000K bytes
Settings: optimizer=on
Optimizer status: PQO version 1.650
Total runtime: 30.161 ms
请注意,在查询计划中,它有"动态表扫描在foos"然后在那之下,"扫描的分区:平均1.0(满分5分)"。这意味着,它动态地消除了4个分区,只扫描了1个分区。
greenplum.org上还有一个图形计划检查器,可以帮助您阅读计划。