我有一个比我想要的更长的PostgreSQL查询。我正在查看EXPLAIN ANALYZE
的输出,并提到Bitmap Index Scan
。我一直在寻找'网和阅读大约10分钟,但我无法弄明白:
位图索引是制造的东西 - 如果我在某处向某个列添加了真实索引,我可以改进的东西 - 或者它是真实索引的特定类型吗
这是我要查询的单个表:
bugbot4b=> \d bug_snapshots
Table "public.bug_snapshots"
Column | Type | Modifiers
------------+-----------------------------+-----------
fixin_id | integer | not null
created_on | timestamp without time zone | not null
pain | integer | not null
status_id | integer | not null
Indexes:
"bug_snapshots_pkey" PRIMARY KEY, btree (fixin_id, created_on)
Foreign-key constraints:
"bug_snapshots_fixin_id_fkey" FOREIGN KEY (fixin_id) REFERENCES fixins(id) ON DELETE SET NULL
"bug_snapshots_status_id_fkey" FOREIGN KEY (status_id) REFERENCES statuses(id)
这是分析查询的结果。请注意,查询中有大约3k个不同的fixin_id
(在下面省略),并且该表有900k行。只计算特定时间范围内的那些行会产生15,000行。
EXPLAIN ANALYZE SELECT "created_on", sum("pain") AS "sum_pain" FROM "bug_snapshots"
WHERE (("fixin_id" IN (11,12,33,…,5351))
AND ("status_id" IN (2, 7, 5, 3))
AND ("created_on" >= '2013-10-08 16:42:26.994994-0700')
AND ("created_on" <= '2013-11-07 15:42:26.994994-0800')
AND ("pain" < 999))
GROUP BY "created_on"
ORDER BY "created_on";
Sort (cost=59559.33..59559.38 rows=20 width=12) (actual time=19.472..19.494 rows=30 loops=1)
Sort Key: created_on
Sort Method: quicksort Memory: 18kB
-> HashAggregate (cost=59558.64..59558.89 rows=20 width=12) (actual time=19.401..19.428 rows=30 loops=1)
-> Bitmap Heap Scan on bug_snapshots (cost=9622.42..59509.25 rows=9878 width=12) (actual time=6.849..13.420 rows=6196 loops=1)
Recheck Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
Filter: ((pain < 999) AND (status_id = ANY ('{2,7,5,3}'::integer[])))
-> Bitmap Index Scan on bug_snapshots_pkey (cost=0.00..9619.95 rows=11172 width=0) (actual time=6.801..6.801 rows=6196 loops=1)
Index Cond: ((fixin_id = ANY ('{11,12,33,…,5351}'::integer[])) AND (created_on >= '2013-10-08 16:42:26.994994'::timestamp without time zone) AND (created_on <= '2013-11-07 15:42:26.994994'::timestamp without time zone))
Total runtime: 19.646 ms
(10 rows)
ANALYZE的结果是否告诉我需要在fixin_id(和/或其他字段)中添加索引以提高速度?或者由于它的大小,这只是“慢”?
答案 0 :(得分:4)
Postgres本身没有“位图索引”,有些索引类型允许位图索引扫描。 “位图索引扫描”是index access method,对于组合多个索引查找特别有用。引用手册:
索引访问方法可以支持“普通”索引扫描,“位图”索引扫描,或两者兼而有之。
您可以disable bitmap-scanning(仅用于调试目的!)通过设置:
SET enable_bitmapscan = FALSE;
对于长列表,我的经验是加入临时表比IN
表达式更快。您可以使用VALUES
或unnest()
来实现此目的。这个答案的细节:
SELECT created_on, sum(pain) AS sum_pain
FROM unnest('{11,12,33,…,5351}'::int[]) AS f(fixin_id)
JOIN bug_snapshots USING (fixin_id)
WHERE status_id IN (2, 7, 5, 3)
AND created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz
AND created_on <= '2013-11-07 15:42:26.994994-0800'::timestamptz
AND pain < 999
GROUP BY created_on
ORDER BY created_on;
partial multicolumn索引可能会有所帮助(很多)。这取决于数据分布,负载,稳定查询条件,WHERE
表达式的选择性等细节。例如:
CREATE INDEX bug_snapshots_part_idx ON bug_snapshots (fixin_id, created_on, pain)
WHERE status_id IN (2, 7, 5, 3)
AND pain < 999;
索引中的列序列非常重要。对于实现另一个多列索引的主键btw也是如此。关于dba.SE的回答详情:
Is a composite index also good for queries on the first field?
SQL Fiddle.
对SQLfiddle的性能测试几乎不可靠。运行自己的测试!
timestamp [without time zone]
还有一件事:您的表格中包含created_on
类型的timestamp without time zone
。时间戳根据您当前的时区设置进行解释
但在查询中,您尝试将文字与时区进行比较。如果你添加了一个显式的强制转换,这将有效:
WHERE created_on >= '2013-10-08 16:42:26.994994-0700'::timestamptz
您的文字会被转换为timestamptz
并相应地转换为您当地的时区。但是,由于您未提供数据类型,Postgres会将您的文字投射到匹配类型timestamp
(不 timestamptz
)忽略时间区域偏移。最有可能不是你的意图!
考虑这个测试:
SELECT min(created_on), max(created_on)
FROM bug_snapshots
WHERE created_on >= '2013-10-08 16:42:26.994994-0700'
AND created_on <= '2013-11-07 15:42:26.994994-0800'
相关答案中的详细解释: