Postgres内存不足

时间:2018-06-18 20:22:34

标签: postgresql

我正在设置postgres 10.4的新安装,在负载测试期间,我一直收到内存不足的错误。

配置

 $ cat /etc/security/limits.conf
 postgres   hard    memlock 508559360
 postgres   soft    memlock 508559360


$ cat /etc/sysctl.conf
vm.nr_hugepages = 248320
vm.hugetlb_shm_group = 118
vm.overcommit_memory=2
vm.swappiness=1
vm.vfs_cache_pressure=50
kernel.sem = 250 32000 32 128

系统规格

$ cat /proc/cpuinfo | grep "core id" | wc -l
32

每个CPU都是Intel(R)Xeon(R)CPU E5-2695 v4 @ 2.10GHz。 16个物理核心,32个逻辑如上所示。

$ free -m
free -m
             total        used        free      shared  buff/cache   available
Mem:         510969       498048      2434      3098    10485        8343
Swap:        1071         1071        0

注意:我们将memlock设置为大约485 GB直接专用于postgres,显示在高位"使用"列。

$ df -h /dev/shm
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           250G   64K  250G   1% /dev/shm

$ cat /proc/meminfo 
MemTotal:       523232496 kB
MemFree:         1216648 kB
MemAvailable:    8348808 kB
Buffers:           65220 kB
Cached:         10211780 kB
SwapCached:         7620 kB
Active:          5556092 kB
Inactive:        5432428 kB
Active(anon):    2208064 kB
Inactive(anon):  1593084 kB
Active(file):    3348028 kB
Inactive(file):  3839344 kB
Unevictable:       24128 kB
Mlocked:           24128 kB
SwapTotal:       1097724 kB
SwapFree:              0 kB
Dirty:                52 kB
Writeback:             0 kB
AnonPages:        728044 kB
Mapped:            62904 kB
Shmem:           3085592 kB
Slab:            1454276 kB
SReclaimable:    1345008 kB
SUnreclaim:       109268 kB
KernelStack:       11984 kB
PageTables:        22856 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    15770860 kB
Committed_AS:    7920136 kB
VmallocTotal:   34359738367 kB
VmallocUsed:           0 kB
VmallocChunk:          0 kB
HardwareCorrupted:     0 kB
AnonHugePages:     71680 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:   248320
HugePages_Free:    245883
HugePages_Rsvd:     1800
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      329600 kB
DirectMap2M:    534444032 kB

Postgres信息

SELECT version();
PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit

max_connections=600   
shared_buffers= '8192 MB'
work_mem = '20 MB'
maintenance_work_mem = 2GB
max_parallel_workers = 8
wal_buffers = 16MB
max_wal_size = 20GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = '364 GB'
default_statistics_target = 1000
log_timezone = 'US/Eastern'
track_activities = on
track_counts = on
track_io_timing = on
stats_temp_directory = 'pg_stat_tmp'
datestyle = 'iso, mdy'
timezone = 'US/Eastern'
default_text_search_config = 'pg_catalog.english'
transform_null_equals = on
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 16384
track_functions = all
track_io_timing = true
pg_stat_statements.track = all
session_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '3s'
auto_explain.log_nested_statements='on'
auto_explain.log_analyze=true

注意:我已经开始工作内存设置高达250 MB,并慢慢将其降至20 MB并仍然收到错误。我还验证了连接不会超过120个连接。我们在会话模式下在实例前面使用PGBouncer。

错误对于堆栈溢出来说太大了,但在这里它们是链接的

https://codepad.co/snippet/derPU4E8

突出的错误是:

2018-06-18 15:02:22 EDT,28197,mydb,ERROR:  could not resize shared memory segment "/PostgreSQL.1552129380" to 192088 bytes: No space left on device

我不明白它在说什么设备。我没有任何看起来甚至接近OOM的东西,更不用说它所谈论的微小的192088字节。

2018-06-18 15:02:22 EDT,19708,mydb,ERROR:  out of memory
2018-06-18 15:02:22 EDT,19708,mydb,DETAIL:  Failed on request of size 7232.


2018-06-18 15:02:22 EDT,16688,,LOG:  could not fork worker process: Cannot allocate memory
2018-06-18 15:02:22 EDT,4555,,ERROR:  out of memory
2018-06-18 15:02:22 EDT,4555,,DETAIL:  Failed on request of size 78336.
2018-06-18 15:02:22 EDT,4552,,LOG:  could not open directory "/usr/lib/postgresql/10/share/timezone": Cannot allocate memory
2018-06-18 15:02:22 EDT,19935,mydb,ERROR:  could not load library "/usr/lib/postgresql/10/lib/auto_explain.so": /usr/lib/postgresql/10/lib/auto_explain.so: failed to map segment from shared object


2018-06-18 15:02:22 EDT,28193,mydb,ERROR:  out of memory
2018-06-18 15:02:22 EDT,28193,mydb,DETAIL:  Failed on request of size 8192.
2018-06-18 15:02:22 EDT,26927,mydb,ERROR:  could not resize shared memory segment "/PostgreSQL.1101931262" to 192088 bytes: No space left on device

问题:如何调试此问题,更重要的是我该如何解决?

2 个答案:

答案 0 :(得分:0)

检查您的查询是否如此复杂以至于需要消耗大量临时表空间。

答案 1 :(得分:0)

我遵循了Laurenz的建议,不再运行OOM。

vm.overcommit_ratio = 100

此外,我删除了sysctl / limits中的memlock和大页面定义。