Question

我在Postgres有一张大桌子。

表名为bigtable，列为：

integer    |timestamp   |xxx |xxx |...|xxx
category_id|capture_time|col1|col2|...|colN

我已经在tables_id的模数10和capture_time列的日期部分上对表进行了分区。

分区表如下所示：

CREATE TABLE myschema.bigtable_d000h0(
    CHECK ( category_id%10=0 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);

CREATE TABLE myschema.bigtable_d000h1(
    CHECK ( category_id%10=1 AND capture_time >= DATE '2012-01-01' AND capture_time < DATE '2012-01-02')
) INHERITS (myschema.bigtable);

当我在where子句中使用category_id和capture_time运行查询时，不会按预期修剪分区。

explain select * from bigtable where capture_time >= '2012-01-01' and  capture_time < '2012-01-02' and category_id=100;

"Result  (cost=0.00..9476.87 rows=1933 width=216)"
"  ->  Append  (cost=0.00..9476.87 rows=1933 width=216)"
"        ->  Seq Scan on bigtable  (cost=0.00..0.00 rows=1 width=210)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h0 bigtable  (cost=0.00..1921.63 rows=1923 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h1 bigtable  (cost=0.00..776.93 rows=1 width=218)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h2 bigtable  (cost=0.00..974.47 rows=1 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h3 bigtable  (cost=0.00..1351.92 rows=1 width=214)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h4 bigtable  (cost=0.00..577.04 rows=1 width=217)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h5 bigtable  (cost=0.00..360.67 rows=1 width=219)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h6 bigtable  (cost=0.00..1778.18 rows=1 width=214)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h7 bigtable  (cost=0.00..315.82 rows=1 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h8 bigtable  (cost=0.00..372.06 rows=1 width=219)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"
"        ->  Seq Scan on bigtable_d000h9 bigtable  (cost=0.00..1048.16 rows=1 width=215)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100))"

但是，如果我在where子句中添加精确的模数条件（category_id%10=0），它就可以完美地运行

explain select * from bigtable where capture_time >= '2012-01-01' and  capture_time < '2012-01-02' and category_id=100 and category_id%10=0;

"Result  (cost=0.00..2154.09 rows=11 width=215)"
"  ->  Append  (cost=0.00..2154.09 rows=11 width=215)"
"        ->  Seq Scan on bigtable  (cost=0.00..0.00 rows=1 width=210)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"
"        ->  Seq Scan on bigtable_d000h0 bigtable  (cost=0.00..2154.09 rows=10 width=216)"
"              Filter: ((capture_time >= '2012-01-01 00:00:00'::timestamp without time zone) AND (capture_time < '2012-01-02 00:00:00'::timestamp without time zone) AND (category_id = 100) AND ((category_id % 10) = 0))"

有没有办法让分区修剪工作正常而不必在每个查询中添加模数条件？

Answer 1

事情是：对于排除约束PostgreSQL will create an implicit index。在你的情况下，这个索引将是一个部分索引，“因为你在列上使用了expresion，而不仅仅是它的值。它在documentation中说明（寻找11-2示例）：

PostgreSQL没有复杂的定理证明器，可以识别以不同形式编写的数学上等效的表达式。（这样的一般定理证明器不仅难以创建，它可能太慢而无法实际使用。）系统可以识别简单的不等式含义，例如“x <1”意味着“x <2” “; 否则谓词条件必须与查询的WHERE条件的一部分完全匹配，否则索引将不会被识别为可用。匹配发生在查询计划时，而不是在运行时。

因此，您的结果 - 您应该具有与创建CHECK约束时使用的完全相同的表达式。

对于基于HASH的分区，我更喜欢两种方法：

添加一个可以采用有限值集合的字段（在您的情况下为10），如果设计中存在这样的值，则最好;
以指定时间戳范围的方式指定哈希值范围：MINVALUE＆lt; = category_id＆lt; MAXVALUE

此外，还可以创建2级分区：

在第一级创建基于category_id HASH的10个分区;
在第二级，您可以根据日期范围创建必要数量的分区。

虽然我总是尝试只使用1列进行分区，但更容易管理。

Answer 2

对于遇到同样问题的人：我得出结论，最简单的方法是更改查询以包含模数条件category_id%10=0

Postgres分区修剪

2 个答案: