我正在设置postgres 10.4的新安装,在负载测试期间,我一直收到内存不足的错误。
配置:
$ cat /etc/security/limits.conf
postgres hard memlock 508559360
postgres soft memlock 508559360
$ cat /etc/sysctl.conf
vm.nr_hugepages = 248320
vm.hugetlb_shm_group = 118
vm.overcommit_memory=2
vm.swappiness=1
vm.vfs_cache_pressure=50
kernel.sem = 250 32000 32 128
系统规格:
$ cat /proc/cpuinfo | grep "core id" | wc -l
32
每个CPU都是Intel(R)Xeon(R)CPU E5-2695 v4 @ 2.10GHz。 16个物理核心,32个逻辑如上所示。
$ free -m
free -m
total used free shared buff/cache available
Mem: 510969 498048 2434 3098 10485 8343
Swap: 1071 1071 0
注意:我们将memlock设置为大约485 GB直接专用于postgres,显示在高位"使用"列。
$ df -h /dev/shm
Filesystem Size Used Avail Use% Mounted on
tmpfs 250G 64K 250G 1% /dev/shm
$ cat /proc/meminfo
MemTotal: 523232496 kB
MemFree: 1216648 kB
MemAvailable: 8348808 kB
Buffers: 65220 kB
Cached: 10211780 kB
SwapCached: 7620 kB
Active: 5556092 kB
Inactive: 5432428 kB
Active(anon): 2208064 kB
Inactive(anon): 1593084 kB
Active(file): 3348028 kB
Inactive(file): 3839344 kB
Unevictable: 24128 kB
Mlocked: 24128 kB
SwapTotal: 1097724 kB
SwapFree: 0 kB
Dirty: 52 kB
Writeback: 0 kB
AnonPages: 728044 kB
Mapped: 62904 kB
Shmem: 3085592 kB
Slab: 1454276 kB
SReclaimable: 1345008 kB
SUnreclaim: 109268 kB
KernelStack: 11984 kB
PageTables: 22856 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 15770860 kB
Committed_AS: 7920136 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
HardwareCorrupted: 0 kB
AnonHugePages: 71680 kB
CmaTotal: 0 kB
CmaFree: 0 kB
HugePages_Total: 248320
HugePages_Free: 245883
HugePages_Rsvd: 1800
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 329600 kB
DirectMap2M: 534444032 kB
Postgres信息:
SELECT version();
PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609, 64-bit
max_connections=600
shared_buffers= '8192 MB'
work_mem = '20 MB'
maintenance_work_mem = 2GB
max_parallel_workers = 8
wal_buffers = 16MB
max_wal_size = 20GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
effective_cache_size = '364 GB'
default_statistics_target = 1000
log_timezone = 'US/Eastern'
track_activities = on
track_counts = on
track_io_timing = on
stats_temp_directory = 'pg_stat_tmp'
datestyle = 'iso, mdy'
timezone = 'US/Eastern'
default_text_search_config = 'pg_catalog.english'
transform_null_equals = on
shared_preload_libraries = 'pg_stat_statements'
track_activity_query_size = 16384
track_functions = all
track_io_timing = true
pg_stat_statements.track = all
session_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = '3s'
auto_explain.log_nested_statements='on'
auto_explain.log_analyze=true
注意:我已经开始工作内存设置高达250 MB,并慢慢将其降至20 MB并仍然收到错误。我还验证了连接不会超过120个连接。我们在会话模式下在实例前面使用PGBouncer。
错误对于堆栈溢出来说太大了,但在这里它们是链接的:
https://codepad.co/snippet/derPU4E8
突出的错误是:
2018-06-18 15:02:22 EDT,28197,mydb,ERROR: could not resize shared memory segment "/PostgreSQL.1552129380" to 192088 bytes: No space left on device
我不明白它在说什么设备。我没有任何看起来甚至接近OOM的东西,更不用说它所谈论的微小的192088字节。
2018-06-18 15:02:22 EDT,19708,mydb,ERROR: out of memory
2018-06-18 15:02:22 EDT,19708,mydb,DETAIL: Failed on request of size 7232.
2018-06-18 15:02:22 EDT,16688,,LOG: could not fork worker process: Cannot allocate memory
2018-06-18 15:02:22 EDT,4555,,ERROR: out of memory
2018-06-18 15:02:22 EDT,4555,,DETAIL: Failed on request of size 78336.
2018-06-18 15:02:22 EDT,4552,,LOG: could not open directory "/usr/lib/postgresql/10/share/timezone": Cannot allocate memory
2018-06-18 15:02:22 EDT,19935,mydb,ERROR: could not load library "/usr/lib/postgresql/10/lib/auto_explain.so": /usr/lib/postgresql/10/lib/auto_explain.so: failed to map segment from shared object
2018-06-18 15:02:22 EDT,28193,mydb,ERROR: out of memory
2018-06-18 15:02:22 EDT,28193,mydb,DETAIL: Failed on request of size 8192.
2018-06-18 15:02:22 EDT,26927,mydb,ERROR: could not resize shared memory segment "/PostgreSQL.1101931262" to 192088 bytes: No space left on device
问题:如何调试此问题,更重要的是我该如何解决?
答案 0 :(得分:0)
检查您的查询是否如此复杂以至于需要消耗大量临时表空间。
答案 1 :(得分:0)
我遵循了Laurenz的建议,不再运行OOM。
vm.overcommit_ratio = 100
。
此外,我删除了sysctl / limits中的memlock和大页面定义。