我的Mac Book Pro中有一个Postgres数据库。以下是一些基本查询和执行时间。
levi=# select count(1) from publishers;
count
----------
19750023
(1 row)
Time: 5724.240 ms
levi=# select count(1) from publishers where publisher_id is null;
count
-------
0
(1 row)
Time: 4056.290 ms
我在AWS上的Ubuntu上安装了第二个Postgres数据库,相同的表,相同的列,相同的索引,每个表的行数相同。 Ubuntu服务器上的相同查询永远不会返回,即使在几个小时之后也是如此。也没有错误。该服务器具有16GB的内存和100 GB的磁盘空间。 data_directory在配置文件中设置为使用此100GB存储空间。
两个数据库都被指定为开发,没有应用程序或用户正在使用这些数据库,除了我作为数据库所有者运行查询。 使用COPY命令和CSV文件加载了两个数据库(Mac和Ubuntu)。其他一些辅助表加载了INSERT AS SELECT语句,这些语句在ubuntu上相应地完成了预期的时间(类似于我在Mac上经过的时间)。
在Ubuntu上,我只更改了这些参数,以匹配Mac上类似Postgres DB的设置:
effective_cache_size: from 128 MB to 4 GB
maintenance_work_mem: from 16 MB to 64 MB
work_mem: from 1 MB to 4 MB
这里有什么问题?
explain select count(1) from publishers;
QUERY PLAN
-----------------------------------------------------------------------------
Aggregate (cost=1292192.43..1292192.44 rows=1 width=0)
-> Seq Scan on publishers (cost=0.00..1146466.94 rows=58290194 width=0)
(2 rows)
explain select count(1) from publishers where publisher_id is null;
QUERY PLAN
-----------------------------------------------------------------------------
Aggregate (cost=1292192.43..1292192.44 rows=1 width=0)
-> Seq Scan on publishers (cost=0.00..1146466.94 rows=58290194 width=0)
Filter: (publisher_id IS NULL)
(3 rows)
explain select count(1) from wokas where author_id is null;
QUERY PLAN
------------------------------------------------------------------------
Aggregate (cost=1348708.43..1348708.44 rows=1 width=0)
-> Seq Scan on wokas (cost=0.00..1248634.54 rows=40029554 width=0)
Filter: (author_id IS NULL)
(3 rows)
postgres=# explain select count(1) from authors;
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=965641.11..965641.12 rows=1 width=0)
-> Seq Scan on authors (cost=0.00..861030.89 rows=41844089 width=0)
(2 rows)
postgres=# explain select count(1) from authors where author_id is null;
QUERY PLAN
-------------------------------------------------------------------------
Aggregate (cost=965715.30..965715.31 rows=1 width=0)
-> Seq Scan on authors (cost=0.00..861097.04 rows=41847304 width=0)
Filter: (author_id IS NULL)
(3 rows)
答案 0 :(得分:0)
显然,在运行Ubuntu和AWS的Postgres中安装和处理大量数据的情况大不相同,并且比预见的更加困难。