Question

我有用户表：

CREATE TABLE public.users (
    id integer NOT NULL,
    first_name character varying,
    last_name character varying,
    nickname character varying,
    privacy integer
);

具有以下索引：

CREATE INDEX index_users_on_privacy
    ON public.users USING btree
    (privacy)
    TABLESPACE pg_default;

运行以下查询时，在适当的执行时间下我得到了预期的结果：

SELECT  "users".* FROM "users" 
WHERE "users"."id" < 20000
ORDER BY "users"."id" DESC LIMIT 4

说明：

"Limit  (cost=541524.58..541524.59 rows=4 width=1509) (actual time=88.974..89.021 rows=4 loops=1)"
"  ->  Sort  (cost=541524.58..542109.51 rows=233972 width=1509) (actual time=88.964..88.978 rows=4 loops=1)"
"        Sort Key: id DESC"
"        Sort Method: top-N heapsort  Memory: 37kB"
"        ->  Bitmap Heap Scan on users  (cost=3445.58..538015.00 rows=233972 width=1509) (actual time=4.515..50.689 rows=7012 loops=1)"
"              Recheck Cond: (id < 20000)"
"              Heap Blocks: exact=4973"
"              ->  Bitmap Index Scan on users_pkey  (cost=0.00..3387.09 rows=233972 width=0) (actual time=3.735..3.735 rows=7012 loops=1)"
"                    Index Cond: (id < 20000)"
"Planning time: 0.263 ms"
"Execution time: 89.707 ms"

现在，当我尝试向where子句添加任何过滤器时（即在first_name或last_name或昵称上应用like），我也会获得完美的性能，但是通过添加遵循特定条件

AND "users"."privacy" = 0

我的执行时间非常慢

查询：

SELECT  "users".* FROM "users" 
WHERE "users"."id" < 20000
AND "users"."privacy" = 0
ORDER BY "users"."id" DESC LIMIT 4

说明：

"Limit  (cost=389636.94..389636.95 rows=4 width=1509) (actual time=46687.391..46687.441 rows=4 loops=1)"
"  ->  Sort  (cost=389636.94..389958.31 rows=128547 width=1509) (actual time=46687.378..46687.394 rows=4 loops=1)"
"        Sort Key: created_at DESC"
"        Sort Method: top-N heapsort  Memory: 36kB"
"        ->  Bitmap Heap Scan on users  (cost=36688.66..387708.74 rows=128547 width=1509) (actual time=1559.659..46665.366 rows=3459 loops=1)"
"              Recheck Cond: (privacy = 0)"
"              Rows Removed by Index Recheck: 2416"
"              Filter: (id < 20000)"
"              Heap Blocks: exact=356084 lossy=527637"
"              ->  Bitmap Index Scan on index_users_on_privacy  (cost=0.00..36656.52 rows=128547 width=0) (actual time=1426.792..1426.792 rows=2706758 loops=1)"
"                    Index Cond: (privacy = 0)"
"Planning time: 150.160 ms"
"Execution time: 46780.021 ms"

请帮助我了解为什么我会出现46秒的差异以及如何避免这种差异。

注释：

我已经有3年没有任何问题的应用了，最近我开始遇到这个性能问题。
PostgreSQL的版本是10.3，我在不同的机器上尝试了相同的长执行查询，但是它可以正常工作而没有任何问题。

Answer 1

PostgreSQL的统计信息似乎是这样，因为它低估了第二个查询中位图索引扫描的结果行数，而低估了第一个查询中的结果数。

这是一个错误的结论，即使用privacy上的索引将是最有效的策略。

尝试使用以下方法计算表统计信息

ANALYZE users;

如果这样做有效，请配置autovacuum，以便它更频繁地分析该表。

如果仅此还不够，请尝试增加统计信息的粒度：

ALTER TABLE users ALTER privacy SET STATISTICS 1000;
ANALYZE users;

这些措施应该使PostgreSQL选择正确的计划，这就是我的建议。

如果要强制PostgreSQL不使用该索引（强制应该永远是最后要考虑的事情），请重写查询，如下所示：

SELECT users.* FROM users
WHERE users.id < 20000
  AND users.privacy + 0 = 0
ORDER BY users.id DESC
LIMIT 4;

在特定条件下查询执行速度缓慢

1 个答案: