我有一个数据库posts
,其中有大约2000万行。我正在尝试使用以下查询缩小分页列表的帖子:
SELECT "posts".* FROM "posts"
WHERE "posts"."source_id" IN (14790, 14787, 32928, 14796, 14791, 15503, 14789, 14772, 15506, 14794, 15543, 31615, 15507, 15508, 14800)
AND "posts"."deleted_at" IS NULL
ORDER BY external_created_at desc LIMIT 100 OFFSET 0;
(大约有330万行与查询中的source_id
匹配)
当我这样做时,需要大约60秒,我得到以下EXPLAIN ANALYZE
(see on depesz):
EXPLAIN ANALYZE SELECT "posts".* FROM "posts" WHERE "posts"."source_id" IN (14790, 14787, 32928, 14796, 14791, 15503, 14789, 14772, 15506, 14794, 15543, 31615, 15507, 15508, 14800) AND "posts"."deleted_at" IS NULL O
RDER BY external_created_at desc LIMIT 100 OFFSET 0;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2530223.38..2530223.63 rows=100 width=1040) (actual time=66564.583..66564.616 rows=100 loops=1)
-> Sort (cost=2530223.38..2534981.19 rows=1903125 width=1040) (actual time=66564.571..66564.594 rows=100 loops=1)
Sort Key: external_created_at
Sort Method: top-N heapsort Memory: 89kB
-> Bitmap Heap Scan on posts (cost=35499.76..2457487.31 rows=1903125 width=1040) (actual time=279.640..64496.330 rows=1674072 loops=1)
Recheck Cond: ((source_id = ANY ('{14790,14787,32928,14796,14791,15503,14789,14772,15506,14794,15543,31615,15507,15508,14800}'::integer[])) AND (deleted_at IS NULL))
Rows Removed by Index Recheck: 4640188
-> Bitmap Index Scan on index_on_posts_partial_source_id_with_order (cost=0.00..35023.98 rows=1903125 width=0) (actual time=275.922..275.922 rows=1674072 loops=1)
Index Cond: (source_id = ANY ('{14790,14787,32928,14796,14791,15503,14789,14772,15506,14794,15543,31615,15507,15508,14800}'::integer[]))
Total runtime: 66564.962 ms
(10 rows)
这是它正在使用的索引:
CREATE INDEX index_on_posts_partial_source_id_with_order ON posts USING btree (source_id) WHERE (deleted_at IS NULL);
似乎Recheck Cond
是这个查询最慢的事情。我所看到的关于Recheck条件的所有内容都涉及增加postgres使用的内存,因为数据是" lossy"但我在查询计划中没有看到类似内容。
有关如何加快速度的任何建议吗?
似乎以某种方式摆脱了Recheck,或以某种方式排序external_created_at
将是我最好的选择。
编辑:我使用的是postgres版本9.3.4
。这是帖子表:
CREATE TABLE posts (
id integer NOT NULL,
source_id integer,
message text,
image text,
external_id text,
created_at timestamp without time zone,
updated_at timestamp without time zone,
external text,
like_count integer DEFAULT 0 NOT NULL,
comment_count integer DEFAULT 0 NOT NULL,
external_created_at timestamp without time zone,
deleted_at timestamp without time zone,
poster_name character varying(255),
poster_image text,
poster_url character varying(255),
poster_id text,
"position" integer,
location character varying(255),
description text,
video text,
rejected_at timestamp without time zone,
deleted_by character varying(255),
height integer,
width integer
);
答案 0 :(得分:1)
您的查询为分页列表返回了几百万行。仔细思考为这么多页面返回数据的智慧。另外,要认真考虑是否需要所有列。我怀疑你这样做。
我构建了一个粗糙的表格,并在其中插入了大约1000万个随机(ish)行。我使用PostgreSQL 9.4的查询计划与你的查询计划大致相似。
"Limit (cost=138609.10..138609.35 rows=100 width=24) (actual time=1410.012..1410.038 rows=100 loops=1)" " -> Sort (cost=138609.10..140344.25 rows=694059 width=24) (actual time=1410.010..1410.026 rows=100 loops=1)" " Sort Key: external_created_at" " Sort Method: top-N heapsort Memory: 29kB" " -> Bitmap Heap Scan on posts (cost=12217.47..112082.66 rows=694059 width=24) (actual time=374.393..919.687 rows=3000000 loops=1)" " Recheck Cond: ((source_id = ANY ('{14790,14787,32928,14796,14791,15503,14789,14772,15506,14794,15543,31615,15507,15508,14800}'::integer[])) AND (deleted_at IS NULL))" " Heap Blocks: exact=16217" " -> Bitmap Index Scan on index_on_posts_partial_source_id_with_order (cost=0.00..12043.95 rows=694059 width=0) (actual time=370.593..370.593 rows=3000000 loops=1)" " Index Cond: (source_id = ANY ('{14790,14787,32928,14796,14791,15503,14789,14772,15506,14794,15543,31615,15507,15508,14800}'::integer[]))" "Planning time: 0.264 ms" "Execution time: 1410.097 ms"
向external_created_at添加索引会将执行时间减少约470倍。但我没有相同的值分布。
create index on test.posts (external_created_at);
analyze test.posts;
explain analyze
select * from test.posts
where source_id in (14790, 14787, 32928, 14796, 14791, 15503, 14789, 14772, 15506, 14794, 15543, 31615, 15507, 15508, 14800)
and deleted_at is null
order by external_created_at desc limit 100 offset 0;
"Limit (cost=0.43..131.43 rows=100 width=24) (actual time=0.219..2.992 rows=100 loops=1)" " -> Index Scan Backward using posts_external_created_at_idx on posts (cost=0.43..900991.48 rows=687808 width=24) (actual time=0.216..2.976 rows=100 loops=1)" " Filter: ((deleted_at IS NULL) AND (source_id = ANY ('{14790,14787,32928,14796,14791,15503,14789,14772,15506,14794,15543,31615,15507,15508,14800}'::integer[])))" " Rows Removed by Filter: 350" "Planning time: 0.302 ms" "Execution time: 3.024 ms"