PostgreSQL进行seq扫描而不是仅索引扫描

时间:2019-11-12 17:13:17

标签: sql postgresql indexing

我具有以下表格结构:

create table transfers
(
    id serial not null
        constraint transactions_pkey
            primary key,
    name varchar(255) not null,
    money integer not null
);

create index transfers_name_index
    on transfers (name);

执行以下查询时,它会很慢,因为它会进行顺序扫描:

EXPLAIN ANALYZE SELECT name
FROM transfers
GROUP by name
ORDER BY name ASC;

Group  (cost=37860.49..41388.54 rows=14802 width=15) (actual time=4285.530..7459.872 rows=999766 loops=1)
  Group Key: name
  ->  Gather Merge  (cost=37860.49..41314.53 rows=29604 width=15) (actual time=4285.529..7136.432 rows=999935 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=36860.46..36897.47 rows=14802 width=15) (actual time=4104.159..5107.148 rows=333312 loops=3)
              Sort Key: name
              Sort Method: external merge  Disk: 14928kB
              Worker 0:  Sort Method: external merge  Disk: 13616kB
              Worker 1:  Sort Method: external merge  Disk: 13656kB
              ->  Partial HashAggregate  (cost=35687.15..35835.17 rows=14802 width=15) (actual time=604.984..689.111 rows=333312 loops=3)
                    Group Key: name
                    ->  Parallel Seq Scan on transfers  (cost=0.00..32571.52 rows=1246252 width=15) (actual time=0.063..200.548 rows=997032 loops=3)
Planning Time: 0.088 ms
Execution Time: 7531.142 ms

但是,当将seqscan设置为off时,正如我期望的那样,仅正确使用仅索引扫描。

SET enable_seqscan = OFF;

EXPLAIN ANALYZE SELECT name
FROM transfers
GROUP by name
ORDER BY name ASC;

Group  (cost=1000.45..100492.67 rows=14802 width=15) (actual time=8.032..2212.538 rows=999766 loops=1)
  Group Key: name
  ->  Gather Merge  (cost=1000.45..100418.66 rows=29604 width=15) (actual time=8.029..1880.388 rows=999778 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Group  (cost=0.43..96001.60 rows=14802 width=15) (actual time=0.074..383.471 rows=333259 loops=3)
              Group Key: name
              ->  Parallel Index Only Scan using transfers_name_index on transfers  (cost=0.43..92885.97 rows=1246252 width=15) (actual time=0.066..189.436 rows=997032 loops=3)
                    Heap Fetches: 0
Planning Time: 0.197 ms
Execution Time: 2279.321 ms

为什么Postgres不使用更高效的索引而不强制扫描? 该表包含约300万条记录。 我正在使用PostgreSQL 11.2。

3 个答案:

答案 0 :(得分:1)

尝试添加大量数据,然后再次运行查询。 Postgres并不总是使用索引,并且可能决定如果表中只有很少的记录,则扫描会更快。

答案 1 :(得分:1)

要使postgres偏爱仅索引扫描,大多数页面应可见。您可以在pg_class中检查它:

SELECT relpages, relallvisible FROM pg_class WHERE relname='transfers';

如果relallvisible为0或比relpages低得多,则应该对表进行VACUUM:

VACUUM ANALYZE transfers;

答案 2 :(得分:1)

当我用包含1e6个不同名称的3e6行填充您的表时,我得到仅索引扫描。但是,如果我强制不重复的值估算值与您的相符,它将切换到seq扫描:

alter table transfers alter name set (N_DISTINCT = 14802);
analyze transfers;

因此,如果您使用相同的方法将其设置为正确的值,那么我敢打赌,您会选择另一种方法。

为什么首先错了?我敢打赌,您的表聚集在名称上,并且您的default_statistics_target太低了。