postgreSQL解释分析,实际时间与估计之间的关系

时间:2014-11-30 07:30:34

标签: postgresql

https://wiki.postgresql.org/wiki/Introduction_to_VACUUM,_ANALYZE,_EXPLAIN,_and_COUNT

我目前正在阅读此页面以了解postgreSQL的EXPLAIN ANALYZE,并且我试图了解估算成本与实际时间之间的关系。

本页面给出的一个简单示例如下:

->  Nested Loop  (cost=5.64..14.71 rows=1 width=140) (actual time=18.983..19.481 rows=4 loops=1)
               ->  Hash Join  (cost=5.64..8.82 rows=1 width=72) (actual time=18.876..19.212 rows=4 loops=1)
               ->  Index Scan using pg_class_oid_index on pg_class i  (cost=0.00..5.88 rows=1 width=72) (actual time=0.051..0.055 rows=1 loops=4)

它说"如果你进行数学计算,你会发现0.055 * 4占据了散列连接总时间和嵌套循环总时间之间差异的大部分(剩余部分可能是衡量所有这一切的开销) )"

我不确定"差异"这里代表我并不能找到接近0.055 * 4的任何差异..我是愚蠢的,只是忽略了一些微不足道的结果?

顺便说一下,我实际上正在编写关于数据库的实验报告,所以一般来说,如果被要求根据某些具体结果写下关于估计和实际时间的简短评论,我能说什么呢?

这是我需要编写结果的计划:

QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=39911.52..299300.41 rows=1 width=17) (actual time=4660.217..4952.328 rows=1 loops=1)
   Join Filter: (casts.mid = movie.id)
   Rows Removed by Join Filter: 2251735
   ->  Seq Scan on movie  (cost=0.00..29721.64 rows=5542 width=21) (actual time=0.637..316.651 rows=4201 loops=1)
         Filter: (year > 2010)
         Rows Removed by Filter: 1533210
   ->  Materialize  (cost=39911.52..269080.01 rows=6 width=4) (actual time=0.307..1.014 rows=536 loops=4201)
         ->  Hash Join  (cost=39911.52..269079.98 rows=6 width=4) (actual time=1288.827..4089.872 rows=536 loops=1)
               Hash Cond: (casts.pid = actor.id)
               ->  Seq Scan on casts  (cost=0.00..186246.47 rows=11445847 width=8) (actual time=0.293..1487.138 rows=11445847 loops=1)
               ->  Hash  (cost=39911.51..39911.51 rows=1 width=4) (actual time=414.130..414.130 rows=1 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 1kB
                     ->  Seq Scan on actor  (cost=0.00..39911.51 rows=1 width=4) (actual time=100.175..414.125 rows=1 loops=1)
                           Filter: (((fname)::text = 'Tom'::text) AND ((lname)::text = 'Hanks'::text))
                           Rows Removed by Filter: 1865033
 Total runtime: 4952.822 ms

1 个答案:

答案 0 :(得分:2)

看实际时间:

->  Nested Loop  ........ (actual time=18.983..19.481 rows=4 loops=1)
..... 
..... 
->  Hash Join  ....... (actual time=18.876..19.212 rows=4 loops=1)
    ->  Index Scan ......... (actual time=0.051..0.055 rows=1 loops=4)


4(循环)* 0.055 = 0.22

19.212 + 0.22 = 19.432 ==>差不多19.481(缺少0.049)


修改


我认为在actor( fname + lname )添加索引,
甚至只在一列actor( lname )上,可以大大加快这个问题。

看看这个:

 ->  Seq Scan on actor  (cost=0.00..39911.51 rows=1 width=4) (actual time=100.175..414.125 rows=1 loops=1)
       Filter: (((fname)::text = 'Tom'::text) AND ((lname)::text = 'Hanks'::text))
       Rows Removed by Filter: 1865033

PostgreSQL在actos表上执行顺序扫描,并过滤掉1865033行以查找仅1行。扫描的总时间为100到414秒 使用索引时,可以在几毫秒内找到一行。