我有一台忙于在Ubuntu 16.04上运行Percona 5.6.34-79.1-1.xenial的服务器。它运作得很好,但是,每隔几个星期,mysqld就会被内存不足的杀手所击中,我无法找到原因。
root@master02:~# grep Out /var/log/syslog
Apr 6 13:37:03 master02 kernel: [17420955.874564] Out of memory: Kill process 36138 (mysqld) score 659 or sacrifice child
所以它在13:37:03被杀了。
然而,仅仅2秒之前,它使用了大约110 GB RAM(系统有大约160 GB RAM),大约55 GB免费:
root@master02:~# cat /root/logs/free/free-2017-04-06-13\:36\:01.log
total used free shared buff/cache available
Mem: 165050752 109560372 593508 189240 54896872 54434632
Swap: 0 0 0
root@master02:~# cat /root/logs/free/free-2017-04-06-13\:37\:01.log
total used free shared buff/cache available
Mem: 165050752 109582416 602704 189624 54865632 54412072
Swap: 0 0 0
root@master02:~# cat /root/logs/free/free-2017-04-06-13\:38\:01.log
total used free shared buff/cache available
Mem: 165050752 17982728 92226488 189200 54841536 146007904
Swap: 0 0 0
my.cnf设置了“innodb-buffer-pool-size = 130G”。我发现mysqld很可能在2秒内分配了额外的50 GB而且被杀了(虽然我当然可能错了)。
这是一个显示完整OOM的dmesg - 它是一些可调的内存分配问题吗?我很欣赏这里的任何提示。
[17420955.874279] mysqld invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[17420955.874282] mysqld cpuset=/ mems_allowed=0-1
[17420955.874287] CPU: 4 PID: 36138 Comm: mysqld Not tainted 4.4.0-59-generic #80-Ubuntu
[17420955.874288] Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
[17420955.874290] 0000000000000286 00000000dba8c85b ffff88142f4bbaf0 ffffffff813f7583
[17420955.874292] ffff88142f4bbcc8 ffff8827ac8db800 ffff88142f4bbb60 ffffffff8120ad5e
[17420955.874295] ffffffff81cd2dc7 0000000000000000 ffffffff81e67760 0000000000000206
[17420955.874297] Call Trace:
[17420955.874304] [<ffffffff813f7583>] dump_stack+0x63/0x90
[17420955.874307] [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
[17420955.874311] [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
[17420955.874312] [<ffffffff81192b49>] out_of_memory+0x219/0x460
[17420955.874315] [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
[17420955.874317] [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
[17420955.874319] [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
[17420955.874323] [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
[17420955.874326] [<ffffffff811c164d>] ? handle_mm_fault+0xcbd/0x1820
[17420955.874328] [<ffffffff810805a0>] _do_fork+0x80/0x360
[17420955.874329] [<ffffffff81080929>] SyS_clone+0x19/0x20
[17420955.874333] [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[17420955.874343] Mem-Info:
[17420955.874354] active_anon:27160197 inactive_anon:28926 isolated_anon:0
active_file:5497699 inactive_file:7563747 isolated_file:0
unevictable:914 dirty:2486 writeback:0 unstable:0
slab_reclaimable:556865 slab_unreclaimable:45056
mapped:20876 shmem:47414 pagetables:71927 bounce:0
free:154548 free_pcp:64 free_cma:0
[17420955.874357] Node 0 DMA free:15904kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[17420955.874361] lowmem_reserve[]: 0 3706 80497 80497 80497
[17420955.874370] Node 0 DMA32 free:311248kB min:2072kB low:2588kB high:3108kB active_anon:3411180kB inactive_anon:1348kB active_file:4kB inactive_file:8kB unevictable:280kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3835152kB mlocked:280kB dirty:0kB writeback:0kB mapped:524kB shmem:2308kB slab_reclaimable:65844kB slab_unreclaimable:14292kB kernel_stack:1760kB pagetables:13784kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[17420955.874373] lowmem_reserve[]: 0 0 76791 76791 76791
[17420955.874376] Node 0 Normal free:143060kB min:42940kB low:53672kB high:64408kB active_anon:62267420kB inactive_anon:31360kB active_file:7207656kB inactive_file:7550284kB unevictable:3296kB isolated(anon):0kB isolated(file):0kB present:79953920kB managed:78634400kB mlocked:3296kB dirty:3116kB writeback:0kB mapped:18940kB shmem:49312kB slab_reclaimable:838024kB slab_unreclaimable:78600kB kernel_stack:51840kB pagetables:159192kB unstable:0kB bounce:0kB free_pcp:136kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? no
[17420955.874380] lowmem_reserve[]: 0 0 0 0 0
[17420955.874383] Node 1 Normal free:147980kB min:45088kB low:56360kB high:67632kB active_anon:42962188kB inactive_anon:82996kB active_file:14783136kB inactive_file:22704696kB unevictable:80kB isolated(anon):0kB isolated(file):0kB present:83886080kB managed:82565296kB mlocked:80kB dirty:6828kB writeback:0kB mapped:64040kB shmem:138036kB slab_reclaimable:1323592kB slab_unreclaimable:87332kB kernel_stack:48624kB pagetables:114732kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[17420955.874387] lowmem_reserve[]: 0 0 0 0 0
[17420955.874390] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
[17420955.874398] Node 0 DMA32: 13480*4kB (UME) 6994*8kB (UME) 2710*16kB (UME) 714*32kB (UMEH) 354*64kB (UMEH) 225*128kB (UMEH) 107*256kB (UMEH) 52*512kB (UME) 29*1024kB (UMEH) 0*2048kB 0*4096kB = 311248kB
[17420955.874408] Node 0 Normal: 29953*4kB (UMEH) 2985*8kB (UMEH) 1*16kB (H) 0*32kB 2*64kB (H) 2*128kB (H) 2*256kB (H) 0*512kB 1*1024kB (H) 0*2048kB 0*4096kB = 145628kB
[17420955.874416] Node 1 Normal: 36438*4kB (UME) 412*8kB (UMEH) 2*16kB (H) 2*32kB (H) 0*64kB 1*128kB (H) 1*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 150552kB
[17420955.874425] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[17420955.874427] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[17420955.874428] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[17420955.874429] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[17420955.874430] 13109543 total pagecache pages
[17420955.874431] 0 pages in swap cache
[17420955.874440] Swap cache stats: add 0, delete 0, find 0/0
[17420955.874441] Free swap = 0kB
[17420955.874442] Total swap = 0kB
[17420955.874443] 41942941 pages RAM
[17420955.874444] 0 pages HighMem/MovableOnly
[17420955.874444] 680253 pages reserved
[17420955.874445] 0 pages cma reserved
[17420955.874446] 0 pages hwpoisoned
[17420955.874447] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[17420955.874457] [ 677] 0 677 38915 23334 80 3 0 0 systemd-journal
[17420955.874461] [ 698] 0 698 25742 47 17 3 0 0 lvmetad
[17420955.874469] [ 729] 0 729 11545 1219 23 3 0 -1000 systemd-udevd
[17420955.874472] [ 1112] 109 1112 25081 492 19 3 0 0 systemd-timesyn
[17420955.874474] [ 1333] 0 1333 4030 637 12 3 0 0 dhclient
[17420955.874475] [ 1541] 0 1541 6932 547 19 3 0 0 cron
[17420955.874477] [ 1552] 0 1552 365572 5403 79 7 0 0 snapd
[17420955.874479] [ 1559] 102 1559 10725 667 26 3 0 -900 dbus-daemon
[17420955.874480] [ 1570] 0 1570 77473 728 21 3 0 0 lxcfs
[17420955.874482] [ 1576] 101 1576 64099 1008 27 3 0 0 rsyslogd
[17420955.874484] [ 1578] 106 1578 1884 403 9 3 0 0 vnstatd
[17420955.874486] [ 1580] 0 1580 1100 322 8 3 0 0 acpid
[17420955.874488] [ 1582] 0 1582 68674 1100 37 3 0 0 accounts-daemon
[17420955.874489] [ 1584] 0 1584 6511 378 18 3 0 0 atd
[17420955.874491] [ 1593] 0 1593 16380 700 36 3 0 -1000 sshd
[17420955.874493] [ 1595] 0 1595 7157 635 18 3 0 0 systemd-logind
[17420955.874495] [ 1599] 0 1599 7470 139 19 3 0 0 cgmanager
[17420955.874497] [ 1618] 0 1618 3344 116 11 3 0 0 mdadm
[17420955.874498] [ 1622] 0 1622 1306 441 8 3 0 0 iscsid
[17420955.874500] [ 1623] 0 1623 1431 916 8 3 0 -17 iscsid
[17420955.874511] [ 1717] 0 1717 3619 388 12 3 0 0 agetty
[17420955.874513] [ 1718] 0 1718 3665 343 12 3 0 0 agetty
[17420955.874515] [ 1731] 0 1731 5025 653 15 3 0 0 irqbalance
[17420955.874517] [ 1744] 0 1744 69272 719 39 3 0 0 polkitd
[17420955.874519] [ 2484] 1001 2484 11312 218 27 3 0 0 systemd
[17420955.874520] [ 2488] 1001 2488 15805 475 34 3 0 0 (sd-pam)
[17420955.874523] [ 6289] 0 6289 7718 1364 19 3 0 0 tmux
[17420955.874531] [ 6290] 0 6290 5381 924 15 3 0 0 bash
[17420955.874533] [ 6306] 0 6306 2158 395 10 3 0 0 mysqld_safe
[17420955.874537] [36138] 107 36138 43942177 27121187 70971 157 0 0 mysqld
[17420955.874539] [78610] 108 78610 5992 577 16 3 0 0 nrpe
[17420955.874542] [19441] 0 19441 24876 1740 54 3 0 0 sshd
[17420955.874543] [19447] 1008 19447 11312 1147 27 3 0 0 systemd
[17420955.874555] [19449] 1008 19449 15817 487 34 3 0 0 (sd-pam)
[17420955.874557] [19575] 1008 19575 24876 834 51 3 0 0 sshd
[17420955.874558] [19576] 0 19576 14970 933 34 3 0 0 sudo
[17420955.874560] [19577] 0 19577 5388 1343 16 3 0 0 bash
[17420955.874561] [26883] 0 26883 8430 5113 22 5 0 0 mysqld_exporter
[17420955.874564] Out of memory: Kill process 36138 (mysqld) score 659 or sacrifice child
[17420955.890336] Killed process 36138 (mysqld) total-vm:175768708kB, anon-rss:108470860kB, file-rss:13888kB
这是my.cnf文件:
[mysql]
# CLIENT #
port = 3306
socket = /var/run/mysqld/mysqld.sock
[mysqld]
# GENERAL #
user = mysql
default-storage-engine = InnoDB
socket = /var/run/mysqld/mysqld.sock
pid-file = /var/run/mysqld/mysqld.pid
# MyISAM #
key-buffer-size = 32M
myisam-recover = FORCE,BACKUP
# SAFETY #
max-allowed-packet = 16M
max-connect-errors = 1000000
# DATA STORAGE #
datadir = /var/lib/mysql/
# BINARY LOGGING #
log-bin = /var/lib/mysql-binlogs/mysql-bin
expire-logs-days = 7
sync-binlog = 1
binlog-format = MIXED
# REPLICATION #
server-id = 1
auto_increment_offset = 1
# total number of master servers
auto_increment_increment = 2
log-slave-updates = 1
relay-log = /var/lib/mysql-binlogs/relay-bin
slave-net-timeout = 60
# CACHES AND LIMITS #
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 4096
# INNODB #
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 512M
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table = 1
innodb-buffer-pool-size = 125G
innodb_large_prefix = 1
innodb_file_format = Barracuda
# LOGGING #
log-error = /var/log/mysql/mysql-error.log
log-queries-not-using-indexes = 0
slow-query-log = 0
slow-query-log-file = /var/log/mysql/mysql-slow.log
# OTHER SETUP #
character_set_server = utf8mb4
collation-server = utf8mb4_unicode_ci
init-connect = 'SET NAMES utf8mb4'
skip-name-resolve = 1
max_connections = 12288
wait_timeout = 120
connect_timeout = 30
interactive_timeout = 120
答案 0 :(得分:1)
我假设你和PHP以及Apache一起使用它。您很可能会在其中一个应用程序的日志文件中找到导致此问题的原因。
请注意MySQL日志中的失败时间,然后在该时间段内查看其他日志文件,您应该找到最终会引导您回答的线索。
我还应该注意,当系统内存不足时,OOM会选择关闭哪些服务,即使关闭它也可能不是创建该问题的服务。
答案 1 :(得分:1)
max_connections = 12288
This is an eyebrow-raising number. Why so high? Do you use that many connections? You can monitor show global status like 'max_used_connections'
to find out. Unfortunately this number is reset when mysqld restarts, so you'll have to monitor it externally.
There are some buffers that are allocated per connection. So if you ever do get a spike in the number of connections, it could consume a lot of memory suddenly. This is uncommon, but possible.
I suspect that @Vilmos is right, that you have some other process on the server that suddenly consumed the remaining memory, and then the OOM killer decided that innocent little mysqld was the process to kill, even though the memory exhaustion wasn't its fault.
There's a lesson about life in there. :-)
You might want to enable a swap partition, and then set up alerting if the swap is used too heavily. Then at least mysqld won't be killed, but it may slow down a lot. And you may be able to find out which process is running that has consumed too much memory.