
时间:2019-02-22 04:03:13

标签: database postgresql indexing database-performance postgresql-9.4

我有一个简单的查询,例如select * from xxx where col is not null limit 10。我不知道为什么Postgres更喜欢seq扫描,它比部分索引要慢得多(我已经分析了表格)。如何调试这样的问题?

该表有超过400万行。大约350,000行满足了pid is not null

我认为成本估算可能有问题。 seq扫描的成本低于索引扫描。但是如何深入研究呢?


> \d data_import
| Column             | Type                     | Modifiers                                                                  |
| id                 | integer                  |  not null default nextval('data_import_id_seq'::regclass) |
| name               | character varying(64)    |                                                                            |
| market_activity_id | integer                  |  not null                                                                  |
| hmsr_id            | integer                  |  not null default (-1)                                                     |
| site_id            | integer                  |  not null default (-1)                                                     |
| hmpl_id            | integer                  |  not null default (-1)                                                     |
| hmmd_id            | integer                  |  not null default (-1)                                                     |
| hmci_id            | integer                  |  not null default (-1)                                                     |
| hmkw_id            | integer                  |  not null default (-1)                                                     |
| creator_id         | integer                  |                                                                            |
| created_at         | timestamp with time zone |                                                                            |
| updated_at         | timestamp with time zone |                                                                            |
| bias               | integer                  |                                                                            |
| pid                | character varying(128)   |  default NULL::character varying                                           |
    "data_import_pkey" PRIMARY KEY, btree (id)
    "unique_hmxx" UNIQUE, btree (site_id, hmsr_id, hmpl_id, hmmd_id, hmci_id, hmkw_id) WHERE pid IS NULL
    "data_import_pid_idx" UNIQUE, btree (pid) WHERE pid IS NOT NULL
    "data_import_created_at_idx" btree (created_at)
    "data_import_hmsr_id" btree (hmsr_id)
    "data_import_updated_at_idx" btree (updated_at)

> set enable_seqscan to false;
apollon> explain (analyse, verbose)  select * from data_import where pid is not null limit 10
| Limit  (cost=0.42..5.68 rows=10 width=84) (actual time=0.059..0.142 rows=10 loops=1)
|   Output: id, name, market_activity_id, hmsr_id, site_id, hmpl_id, hmmd_id, hmci_id, hmkw_id, creator_id, created_at, updated_at, bias, pid
|   ->  Index Scan using data_import_pid_idx on public.data_import  (cost=0.42..184158.08 rows=350584 width=84) (actual time
|         Output: id, name, market_activity_id, hmsr_id, site_id, hmpl_id, hmmd_id, hmci_id, hmkw_id, creator_id, created_at, updated_at, bias, pid
|         Index Cond: (data_import.pid IS NOT NULL)
| Planning time: 0.126 ms
| Execution time: 0.177 ms
Time: 0.054s

> set enable_seqscan to true;
> explain (analyse, verbose)  select * from data_import where pid is not null limit 10
| QUERY PLAN                                                                                                                                        |
| Limit  (cost=0.00..2.37 rows=10 width=84) (actual time=407.042..407.046 rows=10 loops=1)                                                          |
|   Output: id, name, market_activity_id, hmsr_id, site_id, hmpl_id, hmmd_id, hmci_id, hmkw_id, creator_id, created_at, updated_at, bias, pid       |
|   ->  Seq Scan on public.data_import  (cost=0.00..83016.60 rows=350584 width=84) (actual time=407.041..407.045 rows=10 loops=1)  |
|         Output: id, name, market_activity_id, hmsr_id, site_id, hmpl_id, hmmd_id, hmci_id, hmkw_id, creator_id, created_at, updated_at, bias, pid |
|         Filter: (data_import.pid IS NOT NULL)                                                                                    |
|         Rows Removed by Filter: 3672502                                                                                                           |
| Planning time: 0.116 ms                                                                                                                           |
| Execution time: 407.078 ms                                                                                                                        |
Time: 0.426s

1 个答案:

答案 0 :(得分:4)


Rows Removed by Filter: 3672502



如果在ORDER BY pid之前添加LIMIT(即使不需要),则优化器将做正确的事情。
