Question

我有一个比我想要的更长的PostgreSQL查询。我正在查看EXPLAIN ANALYZE的输出，并提到Bitmap Index Scan。我一直在寻找'网和阅读大约10分钟，但我无法弄明白：

位图索引是制造的东西 - 如果我在某处向某个列添加了真实索引，我可以改进的东西 - 或者它是真实索引的特定类型吗

这是我要查询的单个表：

bugbot4b=> \d bug_snapshots
             Table "public.bug_snapshots"
   Column   |            Type             | Modifiers
------------+-----------------------------+-----------
 fixin_id   | integer                     | not null
 created_on | timestamp without time zone | not null
 pain       | integer                     | not null
 status_id  | integer                     | not null
Indexes:
    "bug_snapshots_pkey" PRIMARY KEY, btree (fixin_id, created_on)
Foreign-key constraints:
    "bug_snapshots_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE SET NULL
    "bug_snapshots_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)

这是分析查询的结果。请注意，查询中有大约3k个不同的fixin_id（在下面省略），并且该表有900k行。只计算特定时间范围内的那些行会产生15,000行。

EXPLAIN ANALYZE SELECT "created_on", sum("pain") AS "sum_pain" FROM "bug_snapshots"
WHERE (("fixin_id" IN (11,12,33,…,5351))
   AND ("status_id" IN (2, 7, 5, 3))
   AND ("created_on" >= '2013-10-08 16:42:26.994994-0700')
   AND ("created_on" <= '2013-11-07 15:42:26.994994-0800')
   AND ("pain" < 999))
GROUP BY "created_on"
ORDER BY "created_on";

Sort  (cost=59559.33..59559.38 rows=20 width=12) (actual time=19.472..19.494 rows=30 loops=1)
 Sort Key: created_on
 Sort Method:  quicksort  Memory: 18kB
 ->  HashAggregate  (cost=59558.64..59558.89 rows=20 width=12) (actual time=19.401..19.428 rows=30 loops=1)
       ->  Bitmap Heap Scan on bug_snapshots  (cost=9622.42..59509.25 rows=9878 width=12) (actual time=6.849..13.420 rows=6196 loops=1)
             Recheck Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
             Filter: ((pain < 999) AND (status_id = ANY ('{2,7,5,3}'::integer[])))
             ->  Bitmap Index Scan on bug_snapshots_pkey  (cost=0.00..9619.95 rows=11172 width=0) (actual time=6.801..6.801 rows=6196 loops=1)
                   Index Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
Total runtime: 19.646 ms
(10 rows)

ANALYZE的结果是否告诉我需要在fixin_id（和/或其他字段）中添加索引以提高速度？或者由于它的大小，这只是“慢”？

Answer 1

“位图索引扫描”

Postgres本身没有“位图索引”，有些索引类型允许位图索引扫描。 “位图索引扫描”是index access method，对于组合多个索引查找特别有用。引用手册：

索引访问方法可以支持“普通”索引扫描，“位图”索引扫描，或两者兼而有之。

您可以disable bitmap-scanning（仅用于调试目的！）通过设置：

SET enable_bitmapscan = FALSE;

优化查询性能

对于长列表，我的经验是加入临时表比IN表达式更快。您可以使用VALUES或unnest()来实现此目的。这个答案的细节：

Query table by indexes from integer array

SELECT created_on, sum(pain) AS sum_pain
FROM   unnest('{11,12,33,…,5351}'::int[]) AS f(fixin_id)
JOIN   bug_snapshots USING (fixin_id)
WHERE  status_id IN (2, 7, 5, 3)
AND    created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz
AND    created_on <= '2013-11-07 15:42:26.994994-0800'::timestamptz
AND    pain < 999
GROUP  BY created_on
ORDER  BY created_on;

partial multicolumn索引可能会有所帮助（很多）。这取决于数据分布，负载，稳定查询条件，WHERE表达式的选择性等细节。例如：

CREATE INDEX bug_snapshots_part_idx ON bug_snapshots (fixin_id, created_on, pain)
WHERE  status_id IN (2, 7, 5, 3)
AND    pain < 999;

索引中的列序列非常重要。对于实现另一个多列索引的主键btw也是如此。关于dba.SE的回答详情：
Is a composite index also good for queries on the first field?

SQL Fiddle.
^{对SQLfiddle的性能测试几乎不可靠。运行自己的测试！}

`timestamp [without time zone]`

还有一件事：您的表格中包含created_on类型的timestamp without time zone。时间戳根据您当前的时区设置进行解释但在查询中，您尝试将文字与时区进行比较。如果你添加了一个显式的强制转换，这将有效：

WHERE  created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz

您的文字会被转换为timestamptz并相应地转换为您当地的时区。但是，由于您未提供数据类型，Postgres会将您的文字投射到匹配类型timestamp（不 timestamptz）忽略时间区域偏移。最有可能不是你的意图！

考虑这个测试：

SELECT min(created_on), max(created_on)
FROM   bug_snapshots
WHERE  created_on >= '2013-10-08 16:42:26.994994-0700'
AND    created_on <= '2013-11-07 15:42:26.994994-0800'

什么是“位图索引”？

1 个答案:

“位图索引扫描”

优化查询性能

`timestamp [without time zone]`