Question

我使用Ubuntu 16.04和PostgreSQL 9.5以及Django 1.11

我的网站遭受了超长的ajax呼叫（在某些情况下超过30秒）。相同的ajax调用需要大约500ms的开发时间。

问题与磁盘读取I / O有关。在生产中执行单个查询会驱动磁盘读取I / O up to 25MB/s;开发中的相同的查询导致磁盘读取I / O小于0.01 MB / s。代码和查询在生产/开发中是相同的。

因此，生产中的postgres会导致异常高的磁盘读取I / O.它可能是什么？

这是一个示例查询，在生产中需要大约25秒，在开发中只需要500毫秒：

EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*) AS "__count" FROM "map_listing" 
WHERE ("map_listing"."lo" <  -79.32516245458987 AND "map_listing"."la" > 43.640279060122346
AND "map_listing"."lo" >  -79.60531382177737 AND "map_listing"."transaction_type" = 'Sale'
AND "map_listing"."la" < 43.774544561921296 
AND NOT ("map_listing"."status" = 'Sld' AND "map_listing"."sold_date" < '2018-01-21'::date
AND "map_listing"."sold_date" IS NOT NULL)
AND NOT (("map_listing"."status" = 'Ter' OR "map_listing"."status" = 'Exp'))
AND NOT (("map_listing"."property_type" = 'Parking Space' OR "map_listing"."property_type" = 'Locker')));

对上述声明（制作）执行EXPLAIN (ANALYZE, BUFFERS)的结果：

 Aggregate  (cost=89924.55..89924.56 rows=1 width=0) (actual time=27318.859..27318.860 rows=1 loops=1)
   Buffers: shared read=73424
   ->  Bitmap Heap Scan on map_listing  (cost=4873.96..89836.85 rows=35079 width=0) (actual time=6061.214..27315.183 rows=3228 loops=1)
         Recheck Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
         Rows Removed by Index Recheck: 86733
         Filter: ((lo < '-79.32516245458987'::numeric) AND (lo > '-79.60531382177737'::numeric) AND ((status)::text <> 'Ter'::text) AND ((status)::text <> 'Exp'::text) AND ((property_type)::text <> 'Parking Space'::text) AND ((property_type)::text <> 'Locker'::text) AND ((transaction_type)::text = 'Sale'::text) AND (((status)::text <> 'Sld'::text) OR (sold_date >= '2018-01-21'::date) OR (sold_date IS NULL)))
         Rows Removed by Filter: 190108
         Heap Blocks: exact=46091 lossy=26592
         Buffers: shared read=73424
         ->  Bitmap Index Scan on map_listing_la_88ca396c  (cost=0.00..4865.19 rows=192477 width=0) (actual time=156.964..156.964 rows=194434 loops=1)
               Index Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
               Buffers: shared read=741
 Planning time: 0.546 ms
 Execution time: 27318.926 ms
(14 rows)

EXPLAIN (ANALYZE, BUFFERS) （开发）的结果：

 Aggregate  (cost=95326.23..95326.24 rows=1 width=8) (actual time=495.373..495.373 rows=1 loops=1)
   Buffers: shared read=77281
   ->  Bitmap Heap Scan on map_listing  (cost=5211.98..95225.57 rows=40265 width=0) (actual time=80.929..495.140 rows=4565 loops=1)
         Recheck Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
         Rows Removed by Index Recheck: 85958
         Filter: ((lo < '-79.32516245458987'::numeric) AND (lo > '-79.60531382177737'::numeric) AND ((status)::text <> 'Ter'::text) AND ((status)::text <> 'Exp'::text) AND ((property_type)::text <> 'P
arking Space'::text) AND ((property_type)::text <> 'Locker'::text) AND ((transaction_type)::text = 'Sale'::text) AND (((status)::text <> 'Sld'::text) OR (sold_date >= '2018-01-21'::date) OR (sold_date
 IS NULL)))
         Rows Removed by Filter: 198033
         Heap Blocks: exact=49858 lossy=26639
         Buffers: shared read=77281
         ->  Bitmap Index Scan on map_listing_la_88ca396c  (cost=0.00..5201.91 rows=205749 width=0) (actual time=73.070..73.070 rows=205569 loops=1)
               Index Cond: ((la > 43.640279060122346) AND (la < 43.774544561921296))
               Buffers: shared read=784
 Planning time: 0.962 ms
 Execution time: 495.822 ms
(14 rows)

Answer 1

此查询未生成任何磁盘I / O - 所有块都从共享缓冲区中读取。但是由于查询读取73424个块（大约574 MB），因此在未缓存表时会产生大量的I / O负载。

但有两件事可以改进。

堆扫描中有有损块匹配。这意味着work_mem不足以包含每个表行一位的位图，而26592位则映射表格块。必须重新检查所有行，并丢弃86733行，其中大多数是有损块匹配的误报。

如果增加work_mem，每个表行一位的位图将适合内存，这个数字会缩小，从而减少堆扫描期间的工作。
190108行被丢弃，因为它们与位图堆扫描中的附加过滤条件不匹配。这可能是花费大部分时间的地方。如果你能减少这笔金额，你就会赢。

此查询的理想索引是：
```
CREATE INDEX ON map_listing(transaction_type, la);
CREATE INDEX ON map_listing(transaction_type, lo);
```
如果transaction_type不是非常有选择性（即大多数行的值为Sale），则可以省略该列。

修改

对vmstat和iostat的检查表明，CPU和I / O子系统都遭受了大量过载：所有CPU资源都花在了I / O等待和VM窃取时间上。您需要一个更好的I / O系统和一个拥有更多可用CPU资源的主机系统。增加RAM migjt可以缓解I / O问题，但仅限于磁盘读取。

Answer 2

（我还没有权利发表评论）

我目前遇到类似杰克的问题。创建索引后，我的查询速度变慢，而且我对work_mem和shared_buffers的调整没有任何改进。

当你说RAM是问题时，你做了什么来解决它？我的服务器是32GB RAM，我甚至尝试过设置work_mem = 16GB。

iotop读到：

DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
86.28 M/s    0.00 B/s  0.00 %   87.78 %  postgres

（编辑：link to my question on gis.stackexchange）

生产中的Postgres查询导致异常高的磁盘读取I / O.

2 个答案: