PostgreSQL 10-最新的Ubuntu LTS-1CPU 2GB Ram-未安装其他软件
两个表都有索引:
跟随(22条记录)
提示(250万条记录)
select users_id_to from follows where users_id_from =1
需要0,041毫秒
select tips.id
from tips
where tips.users_id in (2,3,4,5,6,8,79407,38463,42798,94150,76554,56777,71407,51788,4624,41079,13549,75920,18979,6078,26178,18316)
Bitmap Heap Scan on tips (cost=101.72..2122.76 rows=556 width=8) (actual time=0.267..1.120 rows=597 loops=1)
Recheck Cond: (users_id = ANY ('{2,3,4,5,6,8,79407,38463,42798,94150,76554,56777,71407,51788,4624,41079,13549,75920,18979,6078,26178,18316}'::bigint[]))
Heap Blocks: exact=594
-> Bitmap Index Scan on tips_idx_users_id01 (cost=0.00..101.58 rows=556 width=0) (actual time=0.188..0.188 rows=597 loops=1)
Index Cond: (users_id = ANY ('{2,3,4,5,6,8,79407,38463,42798,94150,76554,56777,71407,51788,4624,41079,13549,75920,18979,6078,26178,18316}'::bigint[]))
Planning time: 0.210 ms
Execution time: 1.193 ms
耗时1.2毫秒(第一次运行时为4.7毫秒)
select tips.id
from tips
where tips.users_id in (select users_id_to
from follows
where users_id_from = 1
)
Merge Semi Join (cost=2.29..22.07 rows=573 width=8) (actual time=0.540..10632.242 rows=597 loops=1)
Merge Cond: (tips.users_id = follows.users_id_to)
Buffers: shared hit=1095506 read=1264002
-> Index Scan using tips_idx_users_id01 on tips (cost=0.43..205139.43 rows=2500000 width=16) (actual time=0.021..10180.667 rows=2353909 loops=1)
Buffers: shared hit=1095505 read=1264002
-> Sort (cost=1.77..1.82 rows=22 width=8) (actual time=0.051..0.084 rows=22 loops=1)
Sort Key: follows.users_id_to
Sort Method: quicksort Memory: 26kB
Buffers: shared hit=1
-> Seq Scan on follows (cost=0.00..1.27 rows=22 width=8) (actual time=0.012..0.019 rows=22 loops=1)
Filter: (users_id_from = 1)
Buffers: shared hit=1
Planning time: 0.954 ms
Execution time: 10632.376 ms
耗时10433毫秒
定义:
CREATE TABLE public.follows (
id bigserial NOT NULL,
users_id_from bigint NOT NULL DEFAULT 0,
users_id_to bigint NOT NULL DEFAULT 0,
has_accepted boolean NOT NULL DEFAULT true,
created_on timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
CONSTRAINT followings_pkey
PRIMARY KEY (id)
)
CREATE TABLE public.tips (
id bigserial NOT NULL,
users_id bigint NOT NULL,
temp_id bigint NOT NULL,
first_seen numeric(12,2) NOT NULL DEFAULT 0,
created_on timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
expire_on_gmt timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
ip_from inet NOT NULL DEFAULT '0.0.0.0'::inet,
"type" smallint NOT NULL DEFAULT 0,
growth numeric(8,1) NOT NULL DEFAULT 0.0,
seen boolean DEFAULT false,
CONSTRAINT tips_pkey
PRIMARY KEY (id)
)
CREATE INDEX tips_idx_users_id01
ON public.tips
(users_id);
我真的不明白为什么性能这么差,服务器似乎在后台执行了JOIN ...
任何帮助表示赞赏。
谢谢
佩雷斯
编辑-2018.10.9
尽管答案已被接受,但可以立即解决问题,这要归功于对Pavel Stehule的更深入的调查(请参阅以下帖子),真正的问题是 关注 表的统计信息不正确。 VACUUM ANALYZE解决了这个问题,两个查询现在都运行很快。
答案 0 :(得分:0)
我正在尝试测试用例,但是我有非常不同的计划:
postgres=# explain analyze select * from foo where a in (select a from boo where b = 22); +------------------------------------------------------------------------------------------------------------------------------+ | QUERY PLAN | +------------------------------------------------------------------------------------------------------------------------------+ | Nested Loop (cost=16.19..7066.65 rows=2101 width=8) (actual time=0.444..11.667 rows=2713 loops=1) | | -> HashAggregate (cost=9.43..9.50 rows=7 width=4) (actual time=0.094..0.111 rows=9 loops=1) | | Group Key: boo.a | | -> Bitmap Heap Scan on boo (cost=4.33..9.42 rows=7 width=4) (actual time=0.048..0.071 rows=9 loops=1) | | Recheck Cond: (b = 22) | | Heap Blocks: exact=5 | | -> Bitmap Index Scan on boo_b_idx (cost=0.00..4.33 rows=7 width=0) (actual time=0.030..0.030 rows=9 loops=1) | | Index Cond: (b = 22) | | -> Bitmap Heap Scan on foo (cost=6.75..1005.16 rows=300 width=8) (actual time=0.256..1.143 rows=301 loops=9) | | Recheck Cond: (a = boo.a) | | Heap Blocks: exact=2678 | | -> Bitmap Index Scan on foo_a_idx (cost=0.00..6.68 rows=300 width=0) (actual time=0.145..0.145 rows=301 loops=9) | | Index Cond: (a = boo.a) | | Planning time: 0.971 ms | | Execution time: 12.105 ms ဠ | +------------------------------------------------------------------------------------------------------------------------------+ (15 rows)
尽管我惩罚了一些方法,但我得到了明显更好的计划-
postgres=# explain analyze select * from foo where a in (select a from boo where b = 22); +----------------------------------------------------------------------------------------------------------------------------+ | QUERY PLAN | +----------------------------------------------------------------------------------------------------------------------------+ | Nested Loop (cost=18.03..7894.11 rows=2101 width=8) (actual time=0.433..9.809 rows=2713 loops=1) | | -> Unique (cost=17.60..17.63 rows=7 width=4) (actual time=0.384..0.407 rows=9 loops=1) | | -> Sort (cost=17.60..17.62 rows=7 width=4) (actual time=0.383..0.388 rows=9 loops=1) | | Sort Key: boo.a | | Sort Method: quicksort Memory: 25kB | | -> Seq Scan on boo (cost=0.00..17.50 rows=7 width=4) (actual time=0.047..0.358 rows=9 loops=1) | | Filter: (b = 22) | | Rows Removed by Filter: 991 | | -> Index Scan using foo_a_idx on foo (cost=0.43..1122.21 rows=300 width=8) (actual time=0.023..0.874 rows=301 loops=9) | | Index Cond: (a = boo.a) | | Planning time: 0.957 ms | | Execution time: 10.399 ms | +----------------------------------------------------------------------------------------------------------------------------+ (12 rows)
在PostgreSQL 10.5上测试
经过一番游戏,我得到了:
+------------------------------------------------------------------------------------------------------------------------------------------------------+ | QUERY PLAN | +------------------------------------------------------------------------------------------------------------------------------------------------------+ | Gather (cost=1018.03..117733.71 rows=2101 width=8) (actual time=113.420..914.035 rows=2713 loops=1) | | Workers Planned: 2 | | Workers Launched: 2 | | -> Merge Semi Join (cost=18.03..116523.61 rows=875 width=8) (actual time=150.675..904.224 rows=904 loops=3) | | Merge Cond: (foo.a = boo.a) | | -> Parallel Index Scan using foo_a_idx on foo (cost=0.43..113510.99 rows=1250000 width=8) (actual time=0.136..800.463 rows=919564 loops=3) | | -> Sort (cost=17.60..17.62 rows=7 width=4) (actual time=0.347..0.357 rows=9 loops=3) | | Sort Key: boo.a | | Sort Method: quicksort Memory: 25kB | | -> Seq Scan on boo (cost=0.00..17.50 rows=7 width=4) (actual time=0.059..0.286 rows=9 loops=3) | | Filter: (b = 22) | | Rows Removed by Filter: 991 | | Planning time: 0.903 ms | | Execution time: 914.283 ms | +------------------------------------------------------------------------------------------------------------------------------------------------------+ (14 rows)
这很奇怪,因此您没有主动的并列主义(可能是由于成本低廉,但估计看起来不错)。尽管我在优化器上很难看,但查询的最长时长为1秒。
您可以在完整的数据库上运行VACUUM FULL吗?您的IO上没有其他活动吗?
跟进-该问题与下表中的统计信息丢失或过时有关。它具有戏剧性的效果,因为合并联接基于比较两个表的最大值进行了一些优化。当一个值明显低于预期值时,读数和结束时间会少于预期。这是合并联接成本低廉的原因。
答案 1 :(得分:0)
我建议将查询写为:
select t.id
from tips t
where exists (select 1
from follows f
where f.users_id_from = 1 and f.users_id_to = t.users_id
);
并在follow(users_id_to, users_id_from)
上创建索引-按此顺序的两列。
关于Postgres为什么选择该执行计划。 Postgres认为这是最好的。有时优化器会犯错误。也许表格上的统计数据不是最新的。
编辑:
嗯。我想知道这两个版本是否会鼓励Postgres使用tips(id)
上的索引:
with f as (
select users_id_to
from follows
where users_id_from = 1
)
select t.id
from tips t
where t.users_id in (select f.users_id_to from f);
这为Postgres提供了实现子查询然后使用索引的选项(鼓励吗?)。
第二个是简单的join
:
select t.id
from tips t join
follows f
on f.users_id_to = t.id
where f.users_id_from = 1