生产中的Postgres查询导致异常高的磁盘读取I / O.

时间:2018-02-22 05:14:06

标签: postgresql

我使用Ubuntu 16.04和PostgreSQL 9.5以及Django 1.11

我的网站遭受了超长的ajax呼叫(在某些情况下超过30秒)。相同的ajax调用需要大约500ms的开发时间。

问题与磁盘读取I / O有关。在生产中执行单个查询会驱动磁盘读取I / O up to 25MB/s;开发中的相同的查询导致磁盘读取I / O小于0.01 MB / s。代码和查询在生产/开发中是相同的。

因此,生产中的postgres会导致异常高的磁盘读取I / O.它可能是什么?

这是一个示例查询,在生产中需要大约25秒,在开发中只需要500毫秒:

EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*) AS "__count" FROM "map_listing" 
WHERE ("map_listing"."lo" <  -79.32516245458987 AND "map_listing"."la" > 43.640279060122346
AND "map_listing"."lo" >  -79.60531382177737 AND "map_listing"."transaction_type" = 'Sale'
AND "map_listing"."la" < 43.774544561921296 
AND NOT ("map_listing"."status" = 'Sld' AND "map_listing"."sold_date" < '2018-01-21'::date
AND "map_listing"."sold_date" IS NOT NULL)
AND NOT (("map_listing"."status" = 'Ter' OR "map_listing"."status" = 'Exp'))
AND NOT (("map_listing"."property_type" = 'Parking Space' OR "map_listing"."property_type" = 'Locker')));

对上述声明(制作)执行EXPLAIN (ANALYZE, BUFFERS)的结果

 Aggregate  (cost=89924.55..89924.56 rows=1 width=0) (actual time=27318.859..27318.860 rows=1 loops=1)
   Buffers: shared read=73424
   ->  Bitmap Heap Scan on map_listing  (cost=4873.96..89836.85 rows=35079 width=0) (actual time=6061.214..27315.183 rows=3228 loops=1)
         Recheck Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
         Rows Removed by Index Recheck: 86733
         Filter: ((lo < '-79.32516245458987'::numeric) AND (lo > '-79.60531382177737'::numeric) AND ((status)::text <> 'Ter'::text) AND ((status)::text <> 'Exp'::text) AND ((property_type)::text <> 'Parking Space'::text) AND ((property_type)::text <> 'Locker'::text) AND ((transaction_type)::text = 'Sale'::text) AND (((status)::text <> 'Sld'::text) OR (sold_date >= '2018-01-21'::date) OR (sold_date IS NULL)))
         Rows Removed by Filter: 190108
         Heap Blocks: exact=46091 lossy=26592
         Buffers: shared read=73424
         ->  Bitmap Index Scan on map_listing_la_88ca396c  (cost=0.00..4865.19 rows=192477 width=0) (actual time=156.964..156.964 rows=194434 loops=1)
               Index Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
               Buffers: shared read=741
 Planning time: 0.546 ms
 Execution time: 27318.926 ms
(14 rows)

EXPLAIN (ANALYZE, BUFFERS) (开发)的结果

 Aggregate  (cost=95326.23..95326.24 rows=1 width=8) (actual time=495.373..495.373 rows=1 loops=1)
   Buffers: shared read=77281
   ->  Bitmap Heap Scan on map_listing  (cost=5211.98..95225.57 rows=40265 width=0) (actual time=80.929..495.140 rows=4565 loops=1)
         Recheck Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
         Rows Removed by Index Recheck: 85958
         Filter: ((lo < '-79.32516245458987'::numeric) AND (lo > '-79.60531382177737'::numeric) AND ((status)::text <> 'Ter'::text) AND ((status)::text <> 'Exp'::text) AND ((property_type)::text <> 'P
arking Space'::text) AND ((property_type)::text <> 'Locker'::text) AND ((transaction_type)::text = 'Sale'::text) AND (((status)::text <> 'Sld'::text) OR (sold_date >= '2018-01-21'::date) OR (sold_date
 IS NULL)))
         Rows Removed by Filter: 198033
         Heap Blocks: exact=49858 lossy=26639
         Buffers: shared read=77281
         ->  Bitmap Index Scan on map_listing_la_88ca396c  (cost=0.00..5201.91 rows=205749 width=0) (actual time=73.070..73.070 rows=205569 loops=1)
               Index Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
               Buffers: shared read=784
 Planning time: 0.962 ms
 Execution time: 495.822 ms
(14 rows)

2 个答案:

答案 0 :(得分:2)

此查询未生成任何磁盘I / O - 所有块都从共享缓冲区中读取。但是由于查询读取73424个块(大约574 MB),因此在未缓存表时会产生大量的I / O负载。

但有两件事可以改进。

  • 堆扫描中有有损块匹配。这意味着work_mem不足以包含每个表行一位的位图,而26592位则映射表格块。必须重新检查所有行,并丢弃86733行,其中大多数是有损块匹配的误报。

    如果增加work_mem,每个表行一位的位图将适合内存,这个数字会缩小,从而减少堆扫描期间的工作。

  • 190108行被丢弃,因为它们与位图堆扫描中的附加过滤条件不匹配。这可能是花费大部分时间的地方。如果你能减少这笔金额,你就会赢。

    此查询的理想索引是:

    CREATE INDEX ON map_listing(transaction_type, la);
    CREATE INDEX ON map_listing(transaction_type, lo);
    

    如果transaction_type不是非常有选择性(即大多数行的值为Sale),则可以省略该列。

修改

vmstatiostat的检查表明,CPU和I / O子系统都遭受了大量过载:所有CPU资源都花在了I / O等待和VM窃取时间上。您需要一个更好的I / O系统和一个拥有更多可用CPU资源的主机系统。增加RAM migjt可以缓解I / O问题,但仅限于磁盘读取。

答案 1 :(得分:1)

(我还没有权利发表评论)

我目前遇到类似杰克的问题。创建索引后,我的查询速度变慢,而且我对work_mem和shared_buffers的调整没有任何改进。

当你说RAM是问题时,你做了什么来解决它?我的服务器是32GB RAM,我甚至尝试过设置work_mem = 16GB。

iotop读到:

DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
86.28 M/s    0.00 B/s  0.00 %   87.78 %  postgres

(编辑:link to my question on gis.stackexchange