Question

我的Mac Book Pro中有一个Postgres数据库。以下是一些基本查询和执行时间。

levi=# select count(1) from publishers;
  count   
----------
 19750023
(1 row)

Time: 5724.240 ms

levi=# select count(1) from publishers where publisher_id is null;
 count 
-------
     0
(1 row)

Time: 4056.290 ms

我在AWS上的Ubuntu上安装了第二个Postgres数据库，相同的表，相同的列，相同的索引，每个表的行数相同。 Ubuntu服务器上的相同查询永远不会返回，即使在几个小时之后也是如此。也没有错误。该服务器具有16GB的内存和100 GB的磁盘空间。 data_directory在配置文件中设置为使用此100GB存储空间。

两个数据库都被指定为开发，没有应用程序或用户正在使用这些数据库，除了我作为数据库所有者运行查询。使用COPY命令和CSV文件加载了两个数据库（Mac和Ubuntu）。其他一些辅助表加载了INSERT AS SELECT语句，这些语句在ubuntu上相应地完成了预期的时间（类似于我在Mac上经过的时间）。

在Ubuntu上，我只更改了这些参数，以匹配Mac上类似Postgres DB的设置：

effective_cache_size: from 128 MB to  4 GB
maintenance_work_mem: from  16 MB to 64 MB
work_mem:             from   1 MB to  4 MB

这里有什么问题？

编辑1：EXPLAIN

explain select count(1) from publishers;
QUERY PLAN
-----------------------------------------------------------------------------
Aggregate (cost=1292192.43..1292192.44 rows=1 width=0)
-> Seq Scan on publishers (cost=0.00..1146466.94 rows=58290194 width=0)
(2 rows) 

explain select count(1) from publishers where publisher_id is null;
QUERY PLAN
-----------------------------------------------------------------------------
Aggregate (cost=1292192.43..1292192.44 rows=1 width=0)
-> Seq Scan on publishers (cost=0.00..1146466.94 rows=58290194 width=0)
Filter: (publisher_id IS NULL)
(3 rows)

explain select count(1) from wokas where author_id is null;
                               QUERY PLAN                               
------------------------------------------------------------------------
 Aggregate  (cost=1348708.43..1348708.44 rows=1 width=0)
   ->  Seq Scan on wokas  (cost=0.00..1248634.54 rows=40029554 width=0)
         Filter: (author_id IS NULL)
(3 rows)

postgres=# explain select count(1) from authors;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Aggregate  (cost=965641.11..965641.12 rows=1 width=0)
   ->  Seq Scan on authors  (cost=0.00..861030.89 rows=41844089 width=0)
(2 rows)

postgres=# explain select count(1) from authors where author_id is null;
                               QUERY PLAN                                
-------------------------------------------------------------------------
 Aggregate  (cost=965715.30..965715.31 rows=1 width=0)
   ->  Seq Scan on authors  (cost=0.00..861097.04 rows=41847304 width=0)
         Filter: (author_id IS NULL)
(3 rows)

Answer 1

显然，在运行Ubuntu和AWS的Postgres中安装和处理大量数据的情况大不相同，并且比预见的更加困难。

Ubuntu（AWS）上的Postgres数据库 - 查询不会返回数小时

编辑1：EXPLAIN

1 个答案: