如何在同一张表的两个属性之间正确定义索引?

时间:2019-12-17 10:49:41

标签: indexing postgresql-10

我有一张桌子:

CREATE TABLE foo
(
    id SERIAL,
    bar DATE,
    baz DATE
)

有几百万条记录,我希望能够根据值的差异有效地过滤值:

-- either
SELECT * FROM foo WHERE bar WHERE bar = baz;
SELECT * FROM foo WHERE bar WHERE bar != baz;

-- or
SELECT * FROM foo WHERE bar WHERE bar IS NOT DISTINCT FROM baz;
SELECT * FROM foo WHERE bar WHERE bar IS DISTINCT FROM baz;

我尝试了以下索引组合:

CREATE INDEX i_1 ON foo ((bar IS NOT DISTINCT FROM baz));
CREATE INDEX i_2 ON foo ((bar IS DISTINCT FROM baz));

CREATE INDEX i_1 ON foo ((bar = baz));
CREATE INDEX i_2 ON foo ((bar != baz));

,甚至是部分占位符索引:

CREATE INDEX i_1 ON foo (id) WHERE bar IS NOT DISTINCT FROM baz;
CREATE INDEX i_2 ON foo (id) WHERE bar IS DISTINCT FROM baz;

CREATE INDEX i_1 ON foo (id) WHERE bar = baz;
CREATE INDEX i_2 ON foo (id) WHERE bar != baz;

但是由于某些原因,这些索引似乎根本没有被拾取:

EXPLAIN ANALYZE SELECT * FROM foo WHERE bar IS DISTINCT FROM baz

----

QUERY PLAN
Seq Scan on foo  (cost=0.00..283185.05 rows=3249594 width=2609) (actual time=0.088..2272.179 rows=2165900 loops=1)
  Filter: (bar IS DISTINCT FROM baz)
  Rows Removed by Filter: 1100024
Planning time: 0.561 ms
Execution time: 2433.678 ms

和:

EXPLAIN ANALYZE SELECT * FROM foo WHERE bar IS NOT DISTINCT FROM baz

----

QUERY PLAN
Gather  (cost=1000.00..262004.02 rows=16330 width=2609) (actual time=0.298..1599.461 rows=1100024 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on foo  (cost=0.00..259371.02 rows=6804 width=2609) (actual time=9.244..887.380 rows=366675 loops=3)
        Filter: (NOT (bar IS DISTINCT FROM baz))
        Rows Removed by Filter: 721967
Planning time: 0.293 ms
Execution time: 1682.527 ms

我正在运行PostgreSQL版本10。似乎在PostgreSQL 12中,我可以使用generated columns,并在生成的值上放置一个索引。不幸的是,我需要在PostgreSQL 10中找到一个解决方案,其中尚不存在生成的列。

我有没有尝试过一些索引变化,在我的情况下可能有用?

0 个答案:

没有答案