greenplum分区优化

时间:2011-12-15 01:48:02

标签: database greenplum

在greenplum上,我有一个名为fact_table的大表,它由RANGE(day_bucket)分区。以下查询为什么这么慢:

select max(day_bucket) from fact_table where day_bucket >= '2011-09-11 00:00:00' and day_bucket < '2011-12-14'.

我认为应该只查看每个分区的头部并立即返回结果,因为同一day_bucket列的每个分区都是如此。但是greenplum进行了全面扫描来计算结果。任何人都可以向我解释原因吗?


更新

感谢您回答我的问题,但这对您的提示没有帮助。 Greenplum总是进行全面扫描,即使我使用PARTITION BY LIST(day_bucket)创建表:

CREATE TABLE fact_table (
    id character varying(25) NOT NULL,
    day_bucket timestamp without time zone NOT NULL,
)
WITH (appendonly=true, orientation=column, compresstype=zlib, compresslevel=6) DISTRIBUTED BY (user_id) PARTITION BY LIST(day_bucket) 
          (
          PARTITION p20120101 VALUES ('2012-01-01 00:00:00'::timestamp without time zone) WITH (tablename='fact_table_1_prt_p20120101', appendonly=true, orientation=column, compresstype=zlib, compresslevel=6 ), 
          PARTITION p20120102 VALUES ('2012-01-02 00:00:00'::timestamp without time zone) WITH (tablename='fact_table_1_prt_p20120102', appendonly=true, orientation=column, compresstype=zlib, compresslevel=6 ), 
          PARTITION p20120103 VALUES ('2012-01-03 00:00:00'::timestamp without time zone) WITH (tablename='fact_table_1_prt_p20120103', appendonly=true, orientation=column, compresstype=zlib, compresslevel=6 ), 
          PARTITION p20120104 VALUES ('2012-01-04 00:00:00'::timestamp without time zone) WITH (tablename='fact_table_1_prt_p20120104', appendonly=true, orientation=column, compresstype=zlib, compresslevel=6 ), 
       .....

说明命令显示它始终执行完整扫描:

                    - &GT; mytestlist_1_prt_p20120102 mytestlist上的仅附加列式扫描(成本= 0.00..34.95行= 1宽度= 8)                            过滤:day_bucket&gt; ='2012-01-02 00:00:00'::没有时区的时间戳和day_bucket mytestlist_1_prt_p20120103 mytestlist上的仅附加列式扫描(成本= 0.00..39.61行= 1宽度= 8)                            过滤:day_bucket&gt; ='2012-01-02 00:00:00'::没有时区的时间戳和day_bucket

1 个答案:

答案 0 :(得分:2)

您应该注意应用于分区的约束。 要允许优化器正确地从扫描中排除某些分区,您应该帮助他。在您的情况下,您应该使用明确的类型转换:( GP无法在规划阶段自动理解像&#39; yyyy-mm-dd&#39;实际上是时间戳)

select max(day_bucket) 
from fact_table 
where day_bucket >= '2011-09-11 00:00:00'::timestamp 
  and day_bucket <  '2011-12-14'::timestamp