什么是“位图索引”?

时间:2013-11-08 21:51:24

标签: sql postgresql indexing explain postgresql-performance

我有一个比我想要的更长的PostgreSQL查询。我正在查看EXPLAIN ANALYZE的输出,并提到Bitmap Index Scan。我一直在寻找'网和阅读大约10分钟,但我无法弄明白:

位图索引是制造的东西 - 如果我在某处向某个列添加了真实索引,我可以改进的东西 - 或者它是真实索引的特定类型

这是我要查询的单个表:

bugbot4b=> \d bug_snapshots
             Table "public.bug_snapshots"
   Column   |            Type             | Modifiers
------------+-----------------------------+-----------
 fixin_id   | integer                     | not null
 created_on | timestamp without time zone | not null
 pain       | integer                     | not null
 status_id  | integer                     | not null
Indexes:
    "bug_snapshots_pkey" PRIMARY KEY, btree (fixin_id, created_on)
Foreign-key constraints:
    "bug_snapshots_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE SET NULL
    "bug_snapshots_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)

这是分析查询的结果。请注意,查询中有大约3k个不同的fixin_id(在下面省略),并且该表有900k行。只计算特定时间范围内的那些行会产生15,000行。

EXPLAIN ANALYZE SELECT "created_on", sum("pain") AS "sum_pain" FROM "bug_snapshots"
WHERE (("fixin_id" IN (11,12,33,…,5351))
   AND ("status_id" IN (2, 7, 5, 3))
   AND ("created_on" >= '2013-10-08 16:42:26.994994-0700')
   AND ("created_on" <= '2013-11-07 15:42:26.994994-0800')
   AND ("pain" < 999))
GROUP BY "created_on"
ORDER BY "created_on";

Sort  (cost=59559.33..59559.38 rows=20 width=12) (actual time=19.472..19.494 rows=30 loops=1)
 Sort Key: created_on
 Sort Method:  quicksort  Memory: 18kB
 ->  HashAggregate  (cost=59558.64..59558.89 rows=20 width=12) (actual time=19.401..19.428 rows=30 loops=1)
       ->  Bitmap Heap Scan on bug_snapshots  (cost=9622.42..59509.25 rows=9878 width=12) (actual time=6.849..13.420 rows=6196 loops=1)
             Recheck Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
             Filter: ((pain < 999) AND (status_id = ANY ('{2,7,5,3}'::integer[])))
             ->  Bitmap Index Scan on bug_snapshots_pkey  (cost=0.00..9619.95 rows=11172 width=0) (actual time=6.801..6.801 rows=6196 loops=1)
                   Index Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
Total runtime: 19.646 ms
(10 rows)

ANALYZE的结果是否告诉我需要在fixin_id(和/或其他字段)中添加索引以提高速度?或者由于它的大小,这只是“慢”?

1 个答案:

答案 0 :(得分:4)

“位图索引扫描”

Postgres本身没有“位图索引”,有些索引类型允许位图索引扫描。 “位图索引扫描”是index access method,对于组合多个索引查找特别有用。引用手册:

  

索引访问方法可以支持“普通”索引扫描,“位图”索引扫描,或两者兼而有之。

您可以disable bitmap-scanning(仅用于调试目的!)通过设置:

SET enable_bitmapscan = FALSE;

优化查询性能

对于长列表,我的经验是加入临时表比IN表达式更快。您可以使用VALUESunnest()来实现此目的。这个答案的细节:

SELECT created_on, sum(pain) AS sum_pain
FROM   unnest('{11,12,33,…,5351}'::int[]) AS f(fixin_id)
JOIN   bug_snapshots USING (fixin_id)
WHERE  status_id IN (2, 7, 5, 3)
AND    created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz
AND    created_on <= '2013-11-07 15:42:26.994994-0800'::timestamptz
AND    pain < 999
GROUP  BY created_on
ORDER  BY created_on;

partial multicolumn索引可能会有所帮助(很多)。这取决于数据分布,负载,稳定查询条件,WHERE表达式的选择性等细节。例如:

CREATE INDEX bug_snapshots_part_idx ON bug_snapshots (fixin_id, created_on, pain)
WHERE  status_id IN (2, 7, 5, 3)
AND    pain < 999;

索引中的列序列非常重要。对于实现另一个多列索引的主键btw也是如此。关于dba.SE的回答详情:
Is a composite index also good for queries on the first field?

SQL Fiddle.
对SQLfiddle的性能测试几乎不可靠。运行自己的测试!

timestamp [without time zone]

还有一件事:您的表格中包含created_on类型的timestamp without time zone。时间戳根据您当前的时区设置进行解释 但在查询中,您尝试将文字时区进行比较。如果你添加了一个显式的强制转换,这将有效:

WHERE  created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz

您的文字会被转换为timestamptz并相应地转换为您当地的时区。但是,由于您未提供数据类型,Postgres会将您的文字投射到匹配类型timestamp timestamptz忽略时间区域偏移。最有可能不是你的意图!

考虑这个测试:

SELECT min(created_on), max(created_on)
FROM   bug_snapshots
WHERE  created_on >= '2013-10-08 16:42:26.994994-0700'
AND    created_on <= '2013-11-07 15:42:26.994994-0800'

相关答案中的详细解释: