mysqld内存不足,有很多(?)内存

时间:2017-04-06 15:13:44

标签: mysql out-of-memory

我有一台忙于在Ubuntu 16.04上运行Percona 5.6.34-79.1-1.xenial的服务器。它运作得很好,但是,每隔几个星期,mysqld就会被内存不足的杀手所击中,我无法找到原因。

root@master02:~# grep Out /var/log/syslog
Apr  6 13:37:03 master02 kernel: [17420955.874564] Out of memory: Kill process 36138 (mysqld) score 659 or sacrifice child

所以它在13:37:03被杀了。

然而,仅仅2秒之前,它使用了大约110 GB RAM(系统有大约160 GB RAM),大约55 GB免费:

root@master02:~# cat /root/logs/free/free-2017-04-06-13\:36\:01.log
              total        used        free      shared  buff/cache   available
Mem:      165050752   109560372      593508      189240    54896872    54434632
Swap:             0           0           0

root@master02:~# cat /root/logs/free/free-2017-04-06-13\:37\:01.log
              total        used        free      shared  buff/cache   available
Mem:      165050752   109582416      602704      189624    54865632    54412072
Swap:             0           0           0

root@master02:~# cat /root/logs/free/free-2017-04-06-13\:38\:01.log
              total        used        free      shared  buff/cache   available
Mem:      165050752    17982728    92226488      189200    54841536   146007904
Swap:             0           0           0

my.cnf设置了“innodb-buffer-pool-size = 130G”。我发现mysqld很可能在2秒内分配了额外的50 GB而且被杀了(虽然我当然可能错了)。

这是一个显示完整OOM的dmesg - 它是一些可调的内存分配问题吗?我很欣赏这里的任何提示。

[17420955.874279] mysqld invoked oom-killer: gfp_mask=0x26000c0, order=2, oom_score_adj=0
[17420955.874282] mysqld cpuset=/ mems_allowed=0-1
[17420955.874287] CPU: 4 PID: 36138 Comm: mysqld Not tainted 4.4.0-59-generic #80-Ubuntu
[17420955.874288] Hardware name: Xen HVM domU, BIOS 4.2.amazon 11/11/2016
[17420955.874290]  0000000000000286 00000000dba8c85b ffff88142f4bbaf0 ffffffff813f7583
[17420955.874292]  ffff88142f4bbcc8 ffff8827ac8db800 ffff88142f4bbb60 ffffffff8120ad5e
[17420955.874295]  ffffffff81cd2dc7 0000000000000000 ffffffff81e67760 0000000000000206
[17420955.874297] Call Trace:
[17420955.874304]  [<ffffffff813f7583>] dump_stack+0x63/0x90
[17420955.874307]  [<ffffffff8120ad5e>] dump_header+0x5a/0x1c5
[17420955.874311]  [<ffffffff81192722>] oom_kill_process+0x202/0x3c0
[17420955.874312]  [<ffffffff81192b49>] out_of_memory+0x219/0x460
[17420955.874315]  [<ffffffff81198abd>] __alloc_pages_slowpath.constprop.88+0x8fd/0xa70
[17420955.874317]  [<ffffffff81198eb6>] __alloc_pages_nodemask+0x286/0x2a0
[17420955.874319]  [<ffffffff81198f6b>] alloc_kmem_pages_node+0x4b/0xc0
[17420955.874323]  [<ffffffff8107ea5e>] copy_process+0x1be/0x1b70
[17420955.874326]  [<ffffffff811c164d>] ? handle_mm_fault+0xcbd/0x1820
[17420955.874328]  [<ffffffff810805a0>] _do_fork+0x80/0x360
[17420955.874329]  [<ffffffff81080929>] SyS_clone+0x19/0x20
[17420955.874333]  [<ffffffff818384f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[17420955.874343] Mem-Info:
[17420955.874354] active_anon:27160197 inactive_anon:28926 isolated_anon:0
                   active_file:5497699 inactive_file:7563747 isolated_file:0
                   unevictable:914 dirty:2486 writeback:0 unstable:0
                   slab_reclaimable:556865 slab_unreclaimable:45056
                   mapped:20876 shmem:47414 pagetables:71927 bounce:0
                   free:154548 free_pcp:64 free_cma:0
[17420955.874357] Node 0 DMA free:15904kB min:8kB low:8kB high:12kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15988kB managed:15904kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[17420955.874361] lowmem_reserve[]: 0 3706 80497 80497 80497
[17420955.874370] Node 0 DMA32 free:311248kB min:2072kB low:2588kB high:3108kB active_anon:3411180kB inactive_anon:1348kB active_file:4kB inactive_file:8kB unevictable:280kB isolated(anon):0kB isolated(file):0kB present:3915776kB managed:3835152kB mlocked:280kB dirty:0kB writeback:0kB mapped:524kB shmem:2308kB slab_reclaimable:65844kB slab_unreclaimable:14292kB kernel_stack:1760kB pagetables:13784kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[17420955.874373] lowmem_reserve[]: 0 0 76791 76791 76791
[17420955.874376] Node 0 Normal free:143060kB min:42940kB low:53672kB high:64408kB active_anon:62267420kB inactive_anon:31360kB active_file:7207656kB inactive_file:7550284kB unevictable:3296kB isolated(anon):0kB isolated(file):0kB present:79953920kB managed:78634400kB mlocked:3296kB dirty:3116kB writeback:0kB mapped:18940kB shmem:49312kB slab_reclaimable:838024kB slab_unreclaimable:78600kB kernel_stack:51840kB pagetables:159192kB unstable:0kB bounce:0kB free_pcp:136kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:76 all_unreclaimable? no
[17420955.874380] lowmem_reserve[]: 0 0 0 0 0
[17420955.874383] Node 1 Normal free:147980kB min:45088kB low:56360kB high:67632kB active_anon:42962188kB inactive_anon:82996kB active_file:14783136kB inactive_file:22704696kB unevictable:80kB isolated(anon):0kB isolated(file):0kB present:83886080kB managed:82565296kB mlocked:80kB dirty:6828kB writeback:0kB mapped:64040kB shmem:138036kB slab_reclaimable:1323592kB slab_unreclaimable:87332kB kernel_stack:48624kB pagetables:114732kB unstable:0kB bounce:0kB free_pcp:120kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[17420955.874387] lowmem_reserve[]: 0 0 0 0 0
[17420955.874390] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15904kB
[17420955.874398] Node 0 DMA32: 13480*4kB (UME) 6994*8kB (UME) 2710*16kB (UME) 714*32kB (UMEH) 354*64kB (UMEH) 225*128kB (UMEH) 107*256kB (UMEH) 52*512kB (UME) 29*1024kB (UMEH) 0*2048kB 0*4096kB = 311248kB
[17420955.874408] Node 0 Normal: 29953*4kB (UMEH) 2985*8kB (UMEH) 1*16kB (H) 0*32kB 2*64kB (H) 2*128kB (H) 2*256kB (H) 0*512kB 1*1024kB (H) 0*2048kB 0*4096kB = 145628kB
[17420955.874416] Node 1 Normal: 36438*4kB (UME) 412*8kB (UMEH) 2*16kB (H) 2*32kB (H) 0*64kB 1*128kB (H) 1*256kB (H) 2*512kB (H) 0*1024kB 0*2048kB 0*4096kB = 150552kB
[17420955.874425] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[17420955.874427] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[17420955.874428] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[17420955.874429] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[17420955.874430] 13109543 total pagecache pages
[17420955.874431] 0 pages in swap cache
[17420955.874440] Swap cache stats: add 0, delete 0, find 0/0
[17420955.874441] Free swap  = 0kB
[17420955.874442] Total swap = 0kB
[17420955.874443] 41942941 pages RAM
[17420955.874444] 0 pages HighMem/MovableOnly
[17420955.874444] 680253 pages reserved
[17420955.874445] 0 pages cma reserved
[17420955.874446] 0 pages hwpoisoned
[17420955.874447] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[17420955.874457] [  677]     0   677    38915    23334      80       3        0             0 systemd-journal
[17420955.874461] [  698]     0   698    25742       47      17       3        0             0 lvmetad
[17420955.874469] [  729]     0   729    11545     1219      23       3        0         -1000 systemd-udevd
[17420955.874472] [ 1112]   109  1112    25081      492      19       3        0             0 systemd-timesyn
[17420955.874474] [ 1333]     0  1333     4030      637      12       3        0             0 dhclient
[17420955.874475] [ 1541]     0  1541     6932      547      19       3        0             0 cron
[17420955.874477] [ 1552]     0  1552   365572     5403      79       7        0             0 snapd
[17420955.874479] [ 1559]   102  1559    10725      667      26       3        0          -900 dbus-daemon
[17420955.874480] [ 1570]     0  1570    77473      728      21       3        0             0 lxcfs
[17420955.874482] [ 1576]   101  1576    64099     1008      27       3        0             0 rsyslogd
[17420955.874484] [ 1578]   106  1578     1884      403       9       3        0             0 vnstatd
[17420955.874486] [ 1580]     0  1580     1100      322       8       3        0             0 acpid
[17420955.874488] [ 1582]     0  1582    68674     1100      37       3        0             0 accounts-daemon
[17420955.874489] [ 1584]     0  1584     6511      378      18       3        0             0 atd
[17420955.874491] [ 1593]     0  1593    16380      700      36       3        0         -1000 sshd
[17420955.874493] [ 1595]     0  1595     7157      635      18       3        0             0 systemd-logind
[17420955.874495] [ 1599]     0  1599     7470      139      19       3        0             0 cgmanager
[17420955.874497] [ 1618]     0  1618     3344      116      11       3        0             0 mdadm
[17420955.874498] [ 1622]     0  1622     1306      441       8       3        0             0 iscsid
[17420955.874500] [ 1623]     0  1623     1431      916       8       3        0           -17 iscsid
[17420955.874511] [ 1717]     0  1717     3619      388      12       3        0             0 agetty
[17420955.874513] [ 1718]     0  1718     3665      343      12       3        0             0 agetty
[17420955.874515] [ 1731]     0  1731     5025      653      15       3        0             0 irqbalance
[17420955.874517] [ 1744]     0  1744    69272      719      39       3        0             0 polkitd
[17420955.874519] [ 2484]  1001  2484    11312      218      27       3        0             0 systemd
[17420955.874520] [ 2488]  1001  2488    15805      475      34       3        0             0 (sd-pam)
[17420955.874523] [ 6289]     0  6289     7718     1364      19       3        0             0 tmux
[17420955.874531] [ 6290]     0  6290     5381      924      15       3        0             0 bash
[17420955.874533] [ 6306]     0  6306     2158      395      10       3        0             0 mysqld_safe
[17420955.874537] [36138]   107 36138 43942177 27121187   70971     157        0             0 mysqld
[17420955.874539] [78610]   108 78610     5992      577      16       3        0             0 nrpe
[17420955.874542] [19441]     0 19441    24876     1740      54       3        0             0 sshd
[17420955.874543] [19447]  1008 19447    11312     1147      27       3        0             0 systemd
[17420955.874555] [19449]  1008 19449    15817      487      34       3        0             0 (sd-pam)
[17420955.874557] [19575]  1008 19575    24876      834      51       3        0             0 sshd
[17420955.874558] [19576]     0 19576    14970      933      34       3        0             0 sudo
[17420955.874560] [19577]     0 19577     5388     1343      16       3        0             0 bash
[17420955.874561] [26883]     0 26883     8430     5113      22       5        0             0 mysqld_exporter
[17420955.874564] Out of memory: Kill process 36138 (mysqld) score 659 or sacrifice child
[17420955.890336] Killed process 36138 (mysqld) total-vm:175768708kB, anon-rss:108470860kB, file-rss:13888kB

这是my.cnf文件:

[mysql]

# CLIENT #
port                           = 3306
socket                         = /var/run/mysqld/mysqld.sock

[mysqld]

# GENERAL #
user                           = mysql
default-storage-engine         = InnoDB
socket                         = /var/run/mysqld/mysqld.sock
pid-file                       = /var/run/mysqld/mysqld.pid

# MyISAM #
key-buffer-size                = 32M
myisam-recover                 = FORCE,BACKUP

# SAFETY #
max-allowed-packet             = 16M
max-connect-errors             = 1000000

# DATA STORAGE #
datadir                        = /var/lib/mysql/

# BINARY LOGGING #
log-bin                        = /var/lib/mysql-binlogs/mysql-bin
expire-logs-days               = 7
sync-binlog                    = 1

binlog-format                  = MIXED

# REPLICATION #
server-id                      = 1
auto_increment_offset          = 1

# total number of master servers
auto_increment_increment       = 2

log-slave-updates              = 1
relay-log                      = /var/lib/mysql-binlogs/relay-bin
slave-net-timeout              = 60

# CACHES AND LIMITS #
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0
query-cache-size               = 0
thread-cache-size              = 50
open-files-limit               = 65535
table-definition-cache         = 4096
table-open-cache               = 4096

# INNODB #
innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2
innodb-log-file-size           = 512M
innodb-flush-log-at-trx-commit = 2
innodb-file-per-table          = 1
innodb-buffer-pool-size        = 125G
innodb_large_prefix            = 1
innodb_file_format             = Barracuda

# LOGGING #
log-error                      = /var/log/mysql/mysql-error.log
log-queries-not-using-indexes  = 0
slow-query-log                 = 0
slow-query-log-file            = /var/log/mysql/mysql-slow.log

# OTHER SETUP #
character_set_server           = utf8mb4
collation-server               = utf8mb4_unicode_ci
init-connect                   = 'SET NAMES utf8mb4'

skip-name-resolve              = 1

max_connections                = 12288
wait_timeout                   = 120
connect_timeout                = 30
interactive_timeout            = 120

2 个答案:

答案 0 :(得分:1)

我假设你和PHP以及Apache一起使用它。您很可能会在其中一个应用程序的日志文件中找到导致此问题的原因。

请注意MySQL日志中的失败时间,然后在该时间段内查看其他日志文件,您应该找到最终会引导您回答的线索。

我还应该注意,当系统内存不足时,OOM会选择关闭哪些服务,即使关闭它也可能不是创建该问题的服务。

答案 1 :(得分:1)

max_connections                = 12288

This is an eyebrow-raising number. Why so high? Do you use that many connections? You can monitor show global status like 'max_used_connections' to find out. Unfortunately this number is reset when mysqld restarts, so you'll have to monitor it externally.

There are some buffers that are allocated per connection. So if you ever do get a spike in the number of connections, it could consume a lot of memory suddenly. This is uncommon, but possible.

I suspect that @Vilmos is right, that you have some other process on the server that suddenly consumed the remaining memory, and then the OOM killer decided that innocent little mysqld was the process to kill, even though the memory exhaustion wasn't its fault.

There's a lesson about life in there. :-)

You might want to enable a swap partition, and then set up alerting if the swap is used too heavily. Then at least mysqld won't be killed, but it may slow down a lot. And you may be able to find out which process is running that has consumed too much memory.