我在Postgres有一张大桌子。
表名为bigtable
,列为:
integer |timestamp |xxx |xxx |...|xxx
category_id|capture_time|col1|col2|...|colN
我已经在tables_id的模数10和capture_time列的日期部分上对表进行了分区。
分区表如下所示:
CREATE TABLE myschema.bigtable_d000h0(
CHECK ( category_id%10=0 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);
CREATE TABLE myschema.bigtable_d000h1(
CHECK ( category_id%10=1 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);
当我在where子句中使用category_id和capture_time运行查询时,不会按预期修剪分区。
explain select * from bigtable where capture_time >= '2012-01-01' and capture_time < '2012-01-02' and category_id=100;
"Result (cost=0.00..9476.87 rows=1933 width=216)"
" -> Append (cost=0.00..9476.87 rows=1933 width=216)"
" -> Seq Scan on bigtable (cost=0.00..0.00 rows=1 width=210)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
" -> Seq Scan on bigtable_d000h0 bigtable (cost=0.00..1921.63 rows=1923 width=216)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
" -> Seq Scan on bigtable_d000h1 bigtable (cost=0.00..776.93 rows=1 width=218)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
" -> Seq Scan on bigtable_d000h2 bigtable (cost=0.00..974.47 rows=1 width=216)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
" -> Seq Scan on bigtable_d000h3 bigtable (cost=0.00..1351.92 rows=1 width=214)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
" -> Seq Scan on bigtable_d000h4 bigtable (cost=0.00..577.04 rows=1 width=217)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
" -> Seq Scan on bigtable_d000h5 bigtable (cost=0.00..360.67 rows=1 width=219)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
" -> Seq Scan on bigtable_d000h6 bigtable (cost=0.00..1778.18 rows=1 width=214)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
" -> Seq Scan on bigtable_d000h7 bigtable (cost=0.00..315.82 rows=1 width=216)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
" -> Seq Scan on bigtable_d000h8 bigtable (cost=0.00..372.06 rows=1 width=219)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
" -> Seq Scan on bigtable_d000h9 bigtable (cost=0.00..1048.16 rows=1 width=215)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
但是,如果我在where子句中添加精确的模数条件(category_id%10=0
),它就可以完美地运行
explain select * from bigtable where capture_time >= '2012-01-01' and capture_time < '2012-01-02' and category_id=100 and category_id%10=0;
"Result (cost=0.00..2154.09 rows=11 width=215)"
" -> Append (cost=0.00..2154.09 rows=11 width=215)"
" -> Seq Scan on bigtable (cost=0.00..0.00 rows=1 width=210)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"
" -> Seq Scan on bigtable_d000h0 bigtable (cost=0.00..2154.09 rows=10 width=216)"
" Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"
有没有办法让分区修剪工作正常而不必在每个查询中添加模数条件?
答案 0 :(得分:4)
事情是:对于排除约束PostgreSQL will create an implicit index。在你的情况下,这个索引将是一个部分索引,“因为你在列上使用了expresion,而不仅仅是它的值。它在documentation中说明(寻找11-2示例):
PostgreSQL没有复杂的定理证明器,可以识别以不同形式编写的数学上等效的表达式。 (这样的一般定理证明器不仅难以创建,它可能太慢而无法实际使用。)系统可以识别简单的不等式含义,例如“x <1”意味着“x <2” “; 否则谓词条件必须与查询的WHERE条件的一部分完全匹配,否则索引将不会被识别为可用。匹配发生在查询计划时,而不是在运行时。
因此,您的结果 - 您应该具有与创建CHECK约束时使用的完全相同的表达式。
对于基于HASH的分区,我更喜欢两种方法:
此外,还可以创建2级分区:
虽然我总是尝试只使用1列进行分区,但更容易管理。
答案 1 :(得分:1)
对于遇到同样问题的人:
我得出结论,最简单的方法是更改查询以包含模数条件category_id%10=0