简单情况下的子查询性能

时间:2018-10-07 04:12:48

标签: sql postgresql

PostgreSQL 10-最新的Ubuntu LTS-1CPU 2GB Ram-未安装其他软件
两个表都有索引:
跟随(22条记录)
提示(250万条记录)

select users_id_to from follows where users_id_from =1 

需要0,041毫秒

select tips.id
from tips
where tips.users_id in (2,3,4,5,6,8,79407,38463,42798,94150,76554,56777,71407,51788,4624,41079,13549,75920,18979,6078,26178,18316) 

Bitmap Heap Scan on tips  (cost=101.72..2122.76 rows=556 width=8) (actual time=0.267..1.120 rows=597 loops=1)   
  Recheck Cond: (users_id = ANY ('{2,3,4,5,6,8,79407,38463,42798,94150,76554,56777,71407,51788,4624,41079,13549,75920,18979,6078,26178,18316}'::bigint[]))  
  Heap Blocks: exact=594    
  ->  Bitmap Index Scan on tips_idx_users_id01  (cost=0.00..101.58 rows=556 width=0) (actual time=0.188..0.188 rows=597 loops=1)    
        Index Cond: (users_id = ANY ('{2,3,4,5,6,8,79407,38463,42798,94150,76554,56777,71407,51788,4624,41079,13549,75920,18979,6078,26178,18316}'::bigint[]))  
Planning time: 0.210 ms 
Execution time: 1.193 ms 

耗时1.2毫秒(第一次运行时为4.7毫秒)

select tips.id
from tips
where tips.users_id in (select users_id_to
                        from follows
                        where users_id_from = 1
                       )


Merge Semi Join  (cost=2.29..22.07 rows=573 width=8) (actual time=0.540..10632.242 rows=597 loops=1)    
  Merge Cond: (tips.users_id = follows.users_id_to) 
  Buffers: shared hit=1095506 read=1264002  
  ->  Index Scan using tips_idx_users_id01 on tips  (cost=0.43..205139.43 rows=2500000 width=16) (actual time=0.021..10180.667 rows=2353909 loops=1)    
        Buffers: shared hit=1095505 read=1264002    
  ->  Sort  (cost=1.77..1.82 rows=22 width=8) (actual time=0.051..0.084 rows=22 loops=1)    
        Sort Key: follows.users_id_to   
        Sort Method: quicksort  Memory: 26kB    
        Buffers: shared hit=1   
        ->  Seq Scan on follows  (cost=0.00..1.27 rows=22 width=8) (actual time=0.012..0.019 rows=22 loops=1)   
              Filter: (users_id_from = 1)   
              Buffers: shared hit=1 
Planning time: 0.954 ms 
Execution time: 10632.376 ms

耗时10433毫秒
定义:

CREATE TABLE public.follows (
  id             bigserial NOT NULL,
  users_id_from  bigint NOT NULL DEFAULT 0,
  users_id_to    bigint NOT NULL DEFAULT 0,
  has_accepted   boolean NOT NULL DEFAULT true,
  created_on     timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
  CONSTRAINT followings_pkey
    PRIMARY KEY (id)
)

CREATE TABLE public.tips (
  id             bigserial NOT NULL,
  users_id       bigint NOT NULL,
  temp_id      bigint NOT NULL,
  first_seen    numeric(12,2) NOT NULL DEFAULT 0,
  created_on     timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
  expire_on_gmt  timestamp without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
  ip_from        inet NOT NULL DEFAULT '0.0.0.0'::inet,
  "type"         smallint NOT NULL DEFAULT 0,
  growth         numeric(8,1) NOT NULL DEFAULT 0.0,
  seen          boolean DEFAULT false,

  CONSTRAINT tips_pkey
    PRIMARY KEY (id)
)

CREATE INDEX tips_idx_users_id01
  ON public.tips
  (users_id);

我真的不明白为什么性能这么差,服务器似乎在后台执行了JOIN ...
任何帮助表示赞赏。

谢谢
佩雷斯

编辑-2018.10.9
尽管答案已被接受,但可以立即解决问题,这要归功于对Pavel Stehule的更深入的调查(请参阅以下帖子),真正的问题是 关注 表的统计信息不正确。 VACUUM ANALYZE解决了这个问题,两个查询现在都运行很快。

2 个答案:

答案 0 :(得分:0)

我正在尝试测试用例,但是我有非常不同的计划:

postgres=# explain analyze select * from foo where a in (select a from boo where b = 22);
+------------------------------------------------------------------------------------------------------------------------------+
|                                                          QUERY PLAN                                                          |
+------------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=16.19..7066.65 rows=2101 width=8) (actual time=0.444..11.667 rows=2713 loops=1)                           |
|   ->  HashAggregate  (cost=9.43..9.50 rows=7 width=4) (actual time=0.094..0.111 rows=9 loops=1)                              |
|         Group Key: boo.a                                                                                                     |
|         ->  Bitmap Heap Scan on boo  (cost=4.33..9.42 rows=7 width=4) (actual time=0.048..0.071 rows=9 loops=1)              |
|               Recheck Cond: (b = 22)                                                                                         |
|               Heap Blocks: exact=5                                                                                           |
|               ->  Bitmap Index Scan on boo_b_idx  (cost=0.00..4.33 rows=7 width=0) (actual time=0.030..0.030 rows=9 loops=1) |
|                     Index Cond: (b = 22)                                                                                     |
|   ->  Bitmap Heap Scan on foo  (cost=6.75..1005.16 rows=300 width=8) (actual time=0.256..1.143 rows=301 loops=9)             |
|         Recheck Cond: (a = boo.a)                                                                                            |
|         Heap Blocks: exact=2678                                                                                              |
|         ->  Bitmap Index Scan on foo_a_idx  (cost=0.00..6.68 rows=300 width=0) (actual time=0.145..0.145 rows=301 loops=9)   |
|               Index Cond: (a = boo.a)                                                                                        |
| Planning time: 0.971 ms                                                                                                      |
| Execution time: 12.105 ms                                          ဠ                                                         |
+------------------------------------------------------------------------------------------------------------------------------+
(15 rows)

尽管我惩罚了一些方法,但我得到了明显更好的计划-

postgres=# explain analyze select * from foo where a in (select a from boo where b = 22);
+----------------------------------------------------------------------------------------------------------------------------+
|                                                         QUERY PLAN                                                         |
+----------------------------------------------------------------------------------------------------------------------------+
| Nested Loop  (cost=18.03..7894.11 rows=2101 width=8) (actual time=0.433..9.809 rows=2713 loops=1)                          |
|   ->  Unique  (cost=17.60..17.63 rows=7 width=4) (actual time=0.384..0.407 rows=9 loops=1)                                 |
|         ->  Sort  (cost=17.60..17.62 rows=7 width=4) (actual time=0.383..0.388 rows=9 loops=1)                             |
|               Sort Key: boo.a                                                                                              |
|               Sort Method: quicksort  Memory: 25kB                                                                         |
|               ->  Seq Scan on boo  (cost=0.00..17.50 rows=7 width=4) (actual time=0.047..0.358 rows=9 loops=1)             |
|                     Filter: (b = 22)                                                                                       |
|                     Rows Removed by Filter: 991                                                                            |
|   ->  Index Scan using foo_a_idx on foo  (cost=0.43..1122.21 rows=300 width=8) (actual time=0.023..0.874 rows=301 loops=9) |
|         Index Cond: (a = boo.a)                                                                                            |
| Planning time: 0.957 ms                                                                                                    |
| Execution time: 10.399 ms                                                                                                  |
+----------------------------------------------------------------------------------------------------------------------------+
(12 rows)

在PostgreSQL 10.5上测试

经过一番游戏,我得到了:

+------------------------------------------------------------------------------------------------------------------------------------------------------+
|                                                                      QUERY PLAN                                                                      |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Gather  (cost=1018.03..117733.71 rows=2101 width=8) (actual time=113.420..914.035 rows=2713 loops=1)                                                 |
|   Workers Planned: 2                                                                                                                                 |
|   Workers Launched: 2                                                                                                                                |
|   ->  Merge Semi Join  (cost=18.03..116523.61 rows=875 width=8) (actual time=150.675..904.224 rows=904 loops=3)                                      |
|         Merge Cond: (foo.a = boo.a)                                                                                                                  |
|         ->  Parallel Index Scan using foo_a_idx on foo  (cost=0.43..113510.99 rows=1250000 width=8) (actual time=0.136..800.463 rows=919564 loops=3) |
|         ->  Sort  (cost=17.60..17.62 rows=7 width=4) (actual time=0.347..0.357 rows=9 loops=3)                                                       |
|               Sort Key: boo.a                                                                                                                        |
|               Sort Method: quicksort  Memory: 25kB                                                                                                   |
|               ->  Seq Scan on boo  (cost=0.00..17.50 rows=7 width=4) (actual time=0.059..0.286 rows=9 loops=3)                                       |
|                     Filter: (b = 22)                                                                                                                 |
|                     Rows Removed by Filter: 991                                                                                                      |
| Planning time: 0.903 ms                                                                                                                              |
| Execution time: 914.283 ms                                                                                                                           |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
(14 rows)

这很奇怪,因此您没有主动的并列主义(可能是由于成本低廉,但估计看起来不错)。尽管我在优化器上很难看,但查询的最长时长为1秒。

您可以在完整的数据库上运行VACUUM FULL吗?您的IO上没有其他活动吗?

跟进-该问题与下表中的统计信息丢失或过时有关。它具有戏剧性的效果,因为合并联接基于比较两个表的最大值进行了一些优化。当一个值明显低于预期值时,读数和结束时间会少于预期。这是合并联接成本低廉的原因。

答案 1 :(得分:0)

我建议将查询写为:

select t.id
from tips t
where exists (select 1
              from follows f
              where f.users_id_from = 1 and f.users_id_to = t.users_id
             );

并在follow(users_id_to, users_id_from)上创建索引-按此顺序的两列。

关于Postgres为什么选择该执行计划。 Postgres认为这是最好的。有时优化器会犯错误。也许表格上的统计数据不是最新的。

编辑:

嗯。我想知道这两个版本是否会鼓励Postgres使用tips(id)上的索引:

with f as (
      select users_id_to
      from follows
      where users_id_from = 1
     )
select t.id
from tips t
where t.users_id in (select f.users_id_to from f);

这为Postgres提供了实现子查询然后使用索引的选项(鼓励吗?)。

第二个是简单的join

select t.id
from tips t join
     follows f
     on f.users_id_to = t.id
where f.users_id_from = 1