Question

那里有一些问题说postgres没有使用order by，但我的情况是错误地使用了。

无索引排序 - 结果缓存后热运行。需要8.48秒

explain (analyze,buffers) select * from users order by userid limit 100000;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=246372.98..246622.98 rows=100000 width=72) (actual time=8451.119..8479.138 rows=100000 loops=1)
   Buffers: shared hit=16134 read=35121
   ->  Sort  (cost=246372.98..251348.03 rows=1990021 width=72) (actual time=8451.117..8467.403 rows=100000 loops=1)
         Sort Key: userid
         Sort Method: top-N heapsort  Memory: 20207kB
         Buffers: shared hit=16134 read=35121
         ->  Seq Scan on users  (cost=0.00..71155.21 rows=1990021 width=72) (actual time=25.448..7782.830 rows=1995958 loops=1)
               Buffers: shared hit=16134 read=35121
 Planning time: 40.542 ms
 Execution time: 8487.556 ms
(10 rows)

使用userid列上的索引进行排序。使用更多磁盘I / O并占用高达6.2分钟

explain (analyze,buffers) select * from users order by userid limit 100000;
                                                                     QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..12771.83 rows=100000 width=72) (actual time=35.498..372437.748 rows=100000 loops=1)
   Buffers: shared hit=60846 read=39425
   ->  Index Scan using users_userid_idx on users  (cost=0.43..255288.96 rows=1998907 width=72) (actual time=35.496..372372.192 rows=100000 loops=1)
         Buffers: shared hit=60846 read=39425
 Planning time: 0.160 ms
 Execution time: 372476.536 ms
(6 rows)

很少有事情需要注意

在运行两个查询之前，我运行了真空分析。
两者都是热运行，即我在运行3-4次后取出它们
有足够的工作mem，它使用前N个堆。尽管问题是没有索引的排序更快。

我的问题不是改善秩序，而是要理解规划师错误估计的原因。在写这个问题的那一刻，我在postgres 9.4上运行了我的Mac OSx上的这些查询。我没有任何其他具有不同操作系统的机器此刻要测试，也许很快就会生病。

其他人是否可以确认这是否是规划师的错误，或者我的机器有问题。

Answer 1

我对实际发生的事情感到非常难过。在我完成以下步骤之后，这是新的统计数据。

重新启动我的Mac
将共享缓冲区更改为256 MB（以前为128 MB）
重新启动postgres

在我这样做之后，这是新的统计数据。

explain (analyze,buffers) select * from users order by userid limit 100000;
                                                                   QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..12788.49 rows=100000 width=72) (actual time=0.031..78.785 rows=100000 loops=1)
   Buffers: shared hit=100271
   ->  Index Scan using users_userid_idx on users  (cost=0.43..255244.73 rows=1995958 width=72) (actual time=0.030..65.937 rows=100000 loops=1)
         Buffers: shared hit=100271
 Planning time: 0.119 ms
 Execution time: 84.985 ms
(6 rows)

唯一的变化是没有磁盘I / O，因为所有内容都被缓存，可能是因为增加了共享缓冲区。但实际时间变化超出了逻辑。

没有指数的正常的前N个堆也有所改善。

                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=246955.09..247205.09 rows=100000 width=72) (actual time=707.350..734.954 rows=100000 loops=1)
   Buffers: shared hit=26071 read=25184
   ->  Sort  (cost=246955.09..251944.99 rows=1995958 width=72) (actual time=707.348..723.127 rows=100000 loops=1)
         Sort Key: userid
         Sort Method: top-N heapsort  Memory: 20207kB
         Buffers: shared hit=26071 read=25184
         ->  Seq Scan on users  (cost=0.00..71214.58 rows=1995958 width=72) (actual time=9.922..270.684 rows=1995958 loops=1)
               Buffers: shared hit=26071 read=25184
 Planning time: 0.090 ms
 Execution time: 743.788 ms
(10 rows)

将共享缓冲区更改回128 MB，结果仍然很好。

explain (analyze,buffers) select * from users order by userid limit 100000;
                                                                   QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..12788.49 rows=100000 width=72) (actual time=0.098..232.314 rows=100000 loops=1)
   Buffers: shared hit=61313 read=38958
   ->  Index Scan using users_userid_idx on users  (cost=0.43..255244.73 rows=1995958 width=72) (actual time=0.096..218.272 rows=100000 loops=1)
         Buffers: shared hit=61313 read=38958
 Planning time: 0.131 ms
 Execution time: 238.861 ms
(6 rows)


explain (analyze,buffers) select * from users order by userid limit 100000;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=246955.09..247205.09 rows=100000 width=72) (actual time=722.003..749.696 rows=100000 loops=1)
   Buffers: shared hit=16192 read=35063
   ->  Sort  (cost=246955.09..251944.99 rows=1995958 width=72) (actual time=722.001..737.715 rows=100000 loops=1)
         Sort Key: userid
         Sort Method: top-N heapsort  Memory: 20207kB
         Buffers: shared hit=16192 read=35063
         ->  Seq Scan on users  (cost=0.00..71214.58 rows=1995958 width=72) (actual time=8.584..294.605 rows=1995958 loops=1)
               Buffers: shared hit=16192 read=35063
 Planning time: 0.070 ms
 Execution time: 757.495 ms
(10 rows)

我听说有人说不要在Mac /台式机上取得计时结果，但这完全是疯了。

Postgres命令错误地使用索引

1 个答案: