我有用户表:
CREATE TABLE public.users (
id integer NOT NULL,
first_name character varying,
last_name character varying,
nickname character varying,
privacy integer
);
具有以下索引:
CREATE INDEX index_users_on_privacy
ON public.users USING btree
(privacy)
TABLESPACE pg_default;
运行以下查询时,在适当的执行时间下我得到了预期的结果:
SELECT "users".* FROM "users"
WHERE "users"."id" < 20000
ORDER BY "users"."id" DESC LIMIT 4
说明:
"Limit (cost=541524.58..541524.59 rows=4 width=1509) (actual time=88.974..89.021 rows=4 loops=1)"
" -> Sort (cost=541524.58..542109.51 rows=233972 width=1509) (actual time=88.964..88.978 rows=4 loops=1)"
" Sort Key: id DESC"
" Sort Method: top-N heapsort Memory: 37kB"
" -> Bitmap Heap Scan on users (cost=3445.58..538015.00 rows=233972 width=1509) (actual time=4.515..50.689 rows=7012 loops=1)"
" Recheck Cond: (id < 20000)"
" Heap Blocks: exact=4973"
" -> Bitmap Index Scan on users_pkey (cost=0.00..3387.09 rows=233972 width=0) (actual time=3.735..3.735 rows=7012 loops=1)"
" Index Cond: (id < 20000)"
"Planning time: 0.263 ms"
"Execution time: 89.707 ms"
现在,当我尝试向where
子句添加任何过滤器时(即在first_name或last_name或昵称上应用like
),我也会获得完美的性能,但是通过添加遵循特定条件
AND "users"."privacy" = 0
我的执行时间非常慢
查询:
SELECT "users".* FROM "users"
WHERE "users"."id" < 20000
AND "users"."privacy" = 0
ORDER BY "users"."id" DESC LIMIT 4
说明:
"Limit (cost=389636.94..389636.95 rows=4 width=1509) (actual time=46687.391..46687.441 rows=4 loops=1)"
" -> Sort (cost=389636.94..389958.31 rows=128547 width=1509) (actual time=46687.378..46687.394 rows=4 loops=1)"
" Sort Key: created_at DESC"
" Sort Method: top-N heapsort Memory: 36kB"
" -> Bitmap Heap Scan on users (cost=36688.66..387708.74 rows=128547 width=1509) (actual time=1559.659..46665.366 rows=3459 loops=1)"
" Recheck Cond: (privacy = 0)"
" Rows Removed by Index Recheck: 2416"
" Filter: (id < 20000)"
" Heap Blocks: exact=356084 lossy=527637"
" -> Bitmap Index Scan on index_users_on_privacy (cost=0.00..36656.52 rows=128547 width=0) (actual time=1426.792..1426.792 rows=2706758 loops=1)"
" Index Cond: (privacy = 0)"
"Planning time: 150.160 ms"
"Execution time: 46780.021 ms"
请帮助我了解为什么我会出现46秒的差异以及如何避免这种差异。
注释:
答案 0 :(得分:1)
PostgreSQL的统计信息似乎是这样,因为它低估了第二个查询中位图索引扫描的结果行数,而低估了第一个查询中的结果数。
这是一个错误的结论,即使用privacy
上的索引将是最有效的策略。
尝试使用以下方法计算表统计信息
ANALYZE users;
如果这样做有效,请配置autovacuum,以便它更频繁地分析该表。
如果仅此还不够,请尝试增加统计信息的粒度:
ALTER TABLE users ALTER privacy SET STATISTICS 1000;
ANALYZE users;
这些措施应该使PostgreSQL选择正确的计划,这就是我的建议。
如果要强制PostgreSQL不使用该索引(强制应该永远是最后要考虑的事情),请重写查询,如下所示:
SELECT users.* FROM users
WHERE users.id < 20000
AND users.privacy + 0 = 0
ORDER BY users.id DESC
LIMIT 4;