Question

我创建一个包含43kk行的表，用值1..200填充它们。所以每个数字~220k在表格中传播。

create table foo (id integer primary key, val bigint);
insert into foo
  select i, random() * 200 from generate_series(1, 43000000) as i;
create index val_index on foo(val);
vacuum analyze foo;
explain analyze select id from foo where val = 55;

结果： http://explain.depesz.com/s/fdsm

我期望总运行时间＆lt; 1s，有可能吗？我有SSD，核心i5（1,8），4GB RAM。 9,3 Postgres。

如果我使用仅索引扫描，它的工作速度非常快：

explain analyze select val from foo where val = 55;

http://explain.depesz.com/s/7hm

但是我需要选择 id 而不是 val 所以Incex Only扫描不适合我的情况。

提前致谢！

其他信息：

SELECT relname, relpages, reltuples::numeric, pg_size_pretty(pg_table_size(oid)) 
FROM pg_class WHERE oid='foo'::regclass;

结果：

"foo";236758;43800000;"1850 MB"

配置：

"cpu_index_tuple_cost";"0.005";""
"cpu_operator_cost";"0.0025";""
"cpu_tuple_cost";"0.01";""
"effective_cache_size";"16384";"8kB"
"max_connections";"100";""
"max_stack_depth";"2048";"kB"
"random_page_cost";"4";""
"seq_page_cost";"1";""
"shared_buffers";"16384";"8kB"
"temp_buffers";"1024";"8kB"
"work_mem";"204800";"kB"

Answer 1

我在这里得到答案： http://ask.use-the-index-luke.com/questions/235/postgresql-bitmap-heap-scan-on-index-is-very-slow-but-index-only-scan-is-fast

诀窍是对id和值使用复合索引：

create index val_id_index on foo(val, id);

因此将使用仅索引扫描，但我现在可以选择 id 。

select id from foo where val = 55;

结果：

http://explain.depesz.com/s/nDt3

但这仅适用于版本9.2+的Postgres。如果你被迫使用以下版本，请尝试其他选项。

Answer 2

虽然你只查询表的0.5％，或者大约10MB的数据（在近2GB的表中），但感兴趣的值在整个表中均匀分布。

您可以在自己提供的第一个计划中看到它：

BitmapIndexScan在123.172毫秒内完成
BitmapHeapScan需要17055.046毫秒。

您可以尝试根据索引顺序对表进行聚类，这会将行放在同一页面上。在我的SATA磁盘上，我有以下内容：

SET work_mem TO '300MB';
EXPLAIN (analyze,buffers) SELECT id FROM foo WHERE val = 55;

  Bitmap Heap Scan on foo  (...) (actual time=90.315..35091.665 rows=215022 loops=1)
    Heap Blocks: exact=140489
    Buffers: shared hit=20775 read=120306 written=24124

SET maintenance_work_mem TO '1GB';
CLUSTER foo USING val_index;
EXPLAIN (analyze,buffers) SELECT id FROM foo WHERE val = 55;

  Bitmap Heap Scan on foo  (...) (actual time=49.215..407.505 rows=215022 loops=1)
    Heap Blocks: exact=1163
    Buffers: shared read=1755

当然，这是一次性操作，随着时间的推移逐渐变长。

Answer 3

您可以尝试减少random_page_cost - 对于SSD，它可以是1.其次，您可以增加work_mem。对于具有千兆字节RAM的当前服务器，10MB是相对较低的值。你应该重新检查effective_cache_size - 它也可能太低了。

work_mem * max_connection * 2 + shared_buffers < RAM dedicated for Postgres
effective_cache ~ shared_buffers + file system cache

PostgreSQL Bitmap堆扫描索引非常慢，但Index Only Scan很快

3 个答案: