Question

我有一个简单的表格，可存储在线仪表的降水读数。这是表格定义：

    CREATE TABLE public.precip
    (
        gauge_id smallint,
        inches numeric(8, 2),
        reading_time timestamp with time zone
    )

    CREATE INDEX idx_precip3_id
        ON public.precip USING btree
        (gauge_id)

    CREATE INDEX idx_precip3_reading_time
        ON public.precip USING btree
        (reading_time)

CREATE INDEX idx_precip_last_five_days
    ON public.precip USING btree
    (reading_time)
    TABLESPACE pg_default    WHERE reading_time > '2017-02-26 00:00:00+00'::timestamp with time zone

它变得非常大：大约3800万条记录可以追溯到18个月。查询很少请求超过7天的行，并且我在reading_time字段上创建了部分索引，因此Postgres可以遍历一个小得多的索引。但是它并没有在所有查询中使用部分索引。确实使用

上的部分索引

explain analyze select * from precip where gauge_id = 208 and reading_time > '2017-02-27' 
            Bitmap Heap Scan on precip  (cost=8371.94..12864.51 rows=1169 width=16) (actual time=82.216..162.127 rows=2046 loops=1)   
            Recheck Cond: ((gauge_id = 208) AND (reading_time > '2017-02-27 00:00:00+00'::timestamp with time zone))
           ->  BitmapAnd  (cost=8371.94..8371.94 rows=1169 width=0) (actual time=82.183..82.183 rows=0 loops=1)
                ->  Bitmap Index Scan on idx_precip3_id  (cost=0.00..2235.98 rows=119922 width=0) (actual time=20.754..20.754 rows=125601 loops=1)
                      Index Cond: (gauge_id = 208)
                ->  Bitmap Index Scan on idx_precip_last_five_days  (cost=0.00..6135.13 rows=331560 width=0) (actual time=60.099..60.099 rows=520867 loops=1) 
    Total runtime: 162.631 ms

但它确实不使用以下的部分索引。相反，它使用reading_time上的完整索引

 explain analyze select * from precip where gauge_id = 208 and reading_time > now() - interval '7 days' 

Bitmap Heap Scan on precip  (cost=8460.10..13007.47 rows=1182 width=16) (actual time=154.286..228.752 rows=2067 loops=1)
   Recheck Cond: ((gauge_id = 208) AND (reading_time > (now() - '7 days'::interval)))
      ->  BitmapAnd  (cost=8460.10..8460.10 rows=1182 width=0) (actual time=153.799..153.799 rows=0 loops=1)
              ->  Bitmap Index Scan on idx_precip3_id  (cost=0.00..2235.98 rows=119922 width=0) (actual time=15.852..15.852 rows=125601 loops=1)
                   Index Cond: (gauge_id = 208)
        ->  Bitmap Index Scan on idx_precip3_reading_time  (cost=0.00..6223.28 rows=335295 width=0) (actual time=136.162..136.162 rows=522993 loops=1)
              Index Cond: (reading_time > (now() - '7 days'::interval))
Total runtime: 228.647 ms

请注意，今天是2017年3月5日，因此这两个查询实际上是在请求行。但似乎Postgres不会使用部分索引，除非where子句中的时间戳是＆＃34;硬编码＆＃34;。在决定使用哪个索引之前，查询规划器是否不评估now() - interval '7 days'？我按照第一批回复的人的建议发布了查询计划我已经写了几个函数（存储过程），总结了最近6小时，12小时...... 72小时的降雨量。它们都在查询中使用区间方法（例如，reading_time＆gt; now（） - interval＆＃39; 7天＆＃39;）。我不想将此代码移动到应用程序中以向Postgres发送硬编码时间戳。这将产生许多不必要的PHP代码。

关于如何鼓励Postgres使用部分索引的建议？我的计划是每晚重新定义部分索引的日期范围（下拉索引 - >创建索引），但如果Postgres不打算使用它，那似乎有点傻。

谢谢，

亚历

Answer 1

一般来说，当索引列与常量（文字值），函数调用进行比较时，可以使用索引，这些函数调用至少标记为STABLE（这意味着在单个内部）声明，多次调用函数 - 使用相同的参数 - 将产生相同的结果），以及它们的组合。

now()（current_timestamp的别名）标记为STABLE和timestamp_mi_interval()（这是运营商<timestamp> - <interval>的备用函数）标记为IMMUTABLE，比STABLE更严格（now()，current_timestamp和transaction_timestamp标记交易的开始，statement_timestamp()标记语句的开头 - 仍为STABLE - 但clock_timestamp()给出了时钟上显示的时间戳，因此它是VOLATILE。

因此理论上，WHERE reading_time > now() - interval '7 days'应该能够使用reading_time列上的索引。确实如此。但是，由于您定义了部分索引，planner needs to prove the following：

但是，请记住，谓词必须与应该从索引中受益的查询中使用的条件匹配。确切地说，只有当系统能够识别查询的WHERE条件在数学上隐含索引的谓词时，才能在查询中使用部分索引。 PostgreSQL没有复杂的定理证明器，可以识别以不同形式编写的数学上等效的表达式。（这样的一般定理证明器不仅极难创建，它可能太慢而无法实际使用。）系统可以识别简单的不等式含义，例如“x＆lt; 1”暗示“x” ＆lt; 2“;否则谓词条件必须与查询的WHERE条件的一部分完全匹配，否则索引将不会被识别为可用。 匹配发生在查询计划时，而不是在运行时。

这就是您的查询发生了什么，其中and reading_time > now() - interval '7 days'。在评估now() - interval '7 days'时，计划已经发生。并且PostgreSQL无法证明谓词（reading_time > '2017-02-26 00:00:00+00'）将是true。但是当你使用reading_time > '2017-02-27'时，它可以证明这一点。

具有常量值的

You could "guide" the planner，如下所示：

where gauge_id = 208
and   reading_time > '2017-02-26 00:00:00+00'
and   reading_time > now() - interval '7 days'

这种方式规划者意识到它可以使用部分索引，因为indexed_col > index_condition和indexed_col > something_else意味着indexed_col将大于（至少）index_condition。也许它也会大于something_else，但使用索引无关紧要。

我不确定这是否是您正在寻找的答案。恕我直言，如果你有大量的数据（和PostgreSQL 9.5+），一个BRIN index可能会更好地满足你的需求。

Answer 2

计划查询，然后缓存以供以后使用，其中包括选择要应用的索引。由于您的查询包含 volatile 函数now()，因此无法使用部分索引，因为规划器不确定volatile函数将返回什么，因此它是否与部分索引匹配。阅读查询的任何人都会理解部分索引是匹配的，但规划者并不聪明，知道now()做了什么;它唯一知道的是它是一个易变的函数。

根据reading_time，在您的情况下，更好的解决方案是partition the table为更小的块。然后，正确设计的查询将只访问单个分区。

Postgres不对间隔查询使用部分时间戳索引（例如，now（） - interval＆＃39; 7天＆＃39;）

2 个答案: