编辑:我使用的是MySQL版本8.0.14,并使用了官方的MySQL docker镜像。
我有一个在AKS之上运行的主从MySQL拓扑。运行一段时间后,主容器和中间容器将消失(退出代码137 == SIGKILL)。从站很好,所有节点(主节点,中间主节点和从节点)具有完全相同的配置和VM容量(Standard DS14 v2 - 16 vcpus, 112 GB memory)
。没有迹象表明该容器为OOMKilled
,并且整体内存使用率从未超过60%。
kubectl describe pod xxx
仅显示容器已重新启动,因为容器以错误代码137和错误类型Error
退出。
MySQL日志中未显示任何错误,而我在节点的内核日志中看到的唯一错误如下:
May 01 14:40:23 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:23.033211822Z" level=info msg="shim reaped" id=48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a
May 01 14:40:23 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:23.033259823Z" level=warning msg="cleaning up after killed shim" id=48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a namespace=moby
May 01 14:40:23 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:23.231095965Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 01 14:40:23 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:23.231730670Z" level=warning msg="48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a cleanup: failed to unmount IPC: umount /var/lib/docker/containers/48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a/mounts/shm, flags: 0x2: no such file or directory"
May 01 14:40:23 aks-myeuscluster-14926762-0 kubelet[1481]: I0501 14:40:23.973507 1481 kubelet.go:1928] SyncLoop (PLEG): "mysql-master-0_mysql-masters(ce93a7a5-6543-11e9-b093-062e6bba92a2)", event: &pleg.PodLifecycleEvent{ID:"ce93a7a5-6543-11e9-b093-062e6bba92a2", Type:"ContainerDied", Data:"48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a"}
May 01 14:40:23 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:23.986599174Z" level=warning msg="Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."
May 01 14:40:24 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:24.246558869Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/b2d111216341c80b7b5d5aba9f3d81bb81923da583b2c57d99803831575b0596/shim.sock" debug=false pid=23364
May 01 14:40:25 aks-myeuscluster-14926762-0 kubelet[1481]: I0501 14:40:25.007553 1481 kubelet.go:1928] SyncLoop (PLEG): "mysql-master-0_mysql-masters(ce93a7a5-6543-11e9-b093-062e6bba92a2)", event: &pleg.PodLifecycleEvent{ID:"ce93a7a5-6543-11e9-b093-062e6bba92a2", Type:"ContainerStarted", Data:"b2d111216341c80b7b5d5aba9f3d81bb81923da583b2c57d99803831575b0596"}
已终止容器的节点上的docker inspect
显示:
[
{
"Id": "48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a",
"Created": "2019-05-01T04:07:23.816102331Z",
"Path": "docker-entrypoint.sh",
"Args": [
"mysqld"
],
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 137,
"Error": "",
"StartedAt": "2019-05-01T04:07:24.192620731Z",
"FinishedAt": "2019-05-01T14:40:23.230693162Z"
},
"Image": "sha256:71b5c7e10f9b20f4bd37c348872899cac828b1d2edad269fc8b93c9d43682241",
"ResolvConfPath": "/var/lib/docker/containers/d4c359fa3c05b26eed1259288f2073ae98c7dee0d050fb141d5c6f40b2000c09/resolv.conf",
"HostnamePath": "/var/lib/docker/containers/d4c359fa3c05b26eed1259288f2073ae98c7dee0d050fb141d5c6f40b2000c09/hostname",
"HostsPath": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/etc-hosts",
"LogPath": "/var/lib/docker/containers/48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a/48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a-json.log",
"Name": "/k8s_mysql_mysql-master-0_mysql-masters_ce93a7a5-6543-11e9-b093-062e6bba92a2_3",
"RestartCount": 0,
"Driver": "overlay2",
"Platform": "linux",
"MountLabel": "",
"ProcessLabel": "",
"AppArmorProfile": "docker-default",
"ExecIDs": null,
"HostConfig": {
"Binds": [
"/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~empty-dir/data:/var/lib/mysql",
"/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~empty-dir/conf:/etc/mysql/conf.d",
"/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~secret/default-token-96h8m:/var/run/secrets/kubernetes.io/serviceaccount:ro",
"/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/etc-hosts:/etc/hosts",
"/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/containers/mysql/65c11b38:/dev/termination-log"
],
"ContainerIDFile": "",
"LogConfig": {
"Type": "json-file",
"Config": {
"max-file": "5",
"max-size": "50m"
}
},
"NetworkMode": "container:d4c359fa3c05b26eed1259288f2073ae98c7dee0d050fb141d5c6f40b2000c09",
"PortBindings": null,
"RestartPolicy": {
"Name": "",
"MaximumRetryCount": 0
},
"AutoRemove": false,
"VolumeDriver": "",
"VolumesFrom": null,
"CapAdd": null,
"CapDrop": null,
"Dns": null,
"DnsOptions": null,
"DnsSearch": null,
"ExtraHosts": null,
"GroupAdd": null,
"IpcMode": "container:d4c359fa3c05b26eed1259288f2073ae98c7dee0d050fb141d5c6f40b2000c09",
"Cgroup": "",
"Links": null,
"OomScoreAdj": 999,
"PidMode": "",
"Privileged": false,
"PublishAllPorts": false,
"ReadonlyRootfs": false,
"SecurityOpt": [
"seccomp=unconfined"
],
"UTSMode": "",
"UsernsMode": "",
"ShmSize": 67108864,
"Runtime": "runc",
"ConsoleSize": [
0,
0
],
"Isolation": "",
"CpuShares": 102,
"Memory": 120259084288,
"NanoCpus": 0,
"CgroupParent": "/kubepods/burstable/podce93a7a5-6543-11e9-b093-062e6bba92a2",
"BlkioWeight": 0,
"BlkioWeightDevice": null,
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 100000,
"CpuQuota": 1600000,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": [],
"DeviceCgroupRules": null,
"DiskQuota": 0,
"KernelMemory": 0,
"KernelMemoryTCP": 0,
"MemoryReservation": 0,
"MemorySwap": -1,
"MemorySwappiness": null,
"OomKillDisable": false,
"PidsLimit": 0,
"Ulimits": null,
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0,
"MaskedPaths": [
"/proc/acpi",
"/proc/kcore",
"/proc/keys",
"/proc/latency_stats",
"/proc/timer_list",
"/proc/timer_stats",
"/proc/sched_debug",
"/proc/scsi",
"/sys/firmware"
],
"ReadonlyPaths": [
"/proc/asound",
"/proc/bus",
"/proc/fs",
"/proc/irq",
"/proc/sys",
"/proc/sysrq-trigger"
]
},
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/e881867313c0c44e1c344f051c1ed2bdbb2d7060987503ee654bdbb52298e22f-init/diff:/var/lib/docker/overlay2/02dbd10b6ce555ab3e7adfa20c38a11e6efa59ca2a48630f9d9511982c6ad679/diff:/var/lib/docker/overlay2/f90a222989df3b42aecf7bed9005b1126a73b8b0fd4842bd5c7ecdd7f9f011e6/diff:/var/lib/docker/overlay2/08d3f18230cdb2f65212d1a30e7f6b06b5720778dbb1c3d4a672fc4d804f7704/diff:/var/lib/docker/overlay2/aed1b032114dd084eae7b4d4427a3805f96427be0dce0f8b729840f1f230fd91/diff:/var/lib/docker/overlay2/966072e7d9ffe343e1f00b87c8cb8399f05084052cced72644123b19269aa80e/diff:/var/lib/docker/overlay2/154369c09c2853f5d77d05ba917c7c7f5cdb071f5c328ac403b30c7b4ab98a62/diff:/var/lib/docker/overlay2/067d055c87bf0ccb8e151cd8a2ac7c53da7215f4e5bb99e72f56d321b9f236f5/diff:/var/lib/docker/overlay2/22ddb21cd2fdfffc9e3757f89d866b94e471dd5baf81ef2481742a55ca04f3d3/diff:/var/lib/docker/overlay2/4523ae9e068b9dd7d0d6fb9e280cadb1c17e7101f7f280bfd9d4b9b54ca0616d/diff:/var/lib/docker/overlay2/32e5c4244779f2008e9f020f97afd80809c3549f956432e7116e446a59bc079d/diff:/var/lib/docker/overlay2/bfe5c64218049c0c90d4eb2f9bf2356a83e8c44802df0e48d6f55204efaac6a6/diff:/var/lib/docker/overlay2/a0c23bd39e1c83e67f3af090af495de289297d5332d8f7766ee197e0d4ca6a0c/diff",
"MergedDir": "/var/lib/docker/overlay2/e881867313c0c44e1c344f051c1ed2bdbb2d7060987503ee654bdbb52298e22f/merged",
"UpperDir": "/var/lib/docker/overlay2/e881867313c0c44e1c344f051c1ed2bdbb2d7060987503ee654bdbb52298e22f/diff",
"WorkDir": "/var/lib/docker/overlay2/e881867313c0c44e1c344f051c1ed2bdbb2d7060987503ee654bdbb52298e22f/work"
},
"Name": "overlay2"
},
"Mounts": [
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/containers/mysql/65c11b38",
"Destination": "/dev/termination-log",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~empty-dir/data",
"Destination": "/var/lib/mysql",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~empty-dir/conf",
"Destination": "/etc/mysql/conf.d",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~secret/default-token-96h8m",
"Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
"Mode": "ro",
"RW": false,
"Propagation": "rprivate"
},
{
"Type": "bind",
"Source": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/etc-hosts",
"Destination": "/etc/hosts",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
}
],
"Config": {
"Hostname": "mysql-master-0",
"Domainname": "",
"User": "0",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"3306/tcp": {},
"33060/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"KUBERNETES_PORT_443_TCP_ADDR=mysql-east-mysql-eastus-bdfb38-0e053371.hcp.eastus.azmk8s.io",
"KUBERNETES_PORT=tcp://mysql-east-mysql-eastus-bdfb38-0e053371.hcp.eastus.azmk8s.io:443",
"KUBERNETES_PORT_443_TCP=tcp://mysql-east-mysql-eastus-bdfb38-0e053371.hcp.eastus.azmk8s.io:443",
"KUBERNETES_SERVICE_HOST=mysql-east-mysql-eastus-bdfb38-0e053371.hcp.eastus.azmk8s.io",
"MYSQL_ROOT_PASSWORD=YpBwWhZyUnQ8ERR8",
"KUBERNETES_PORT_443_TCP_PROTO=tcp",
"KUBERNETES_PORT_443_TCP_PORT=443",
"MYSQL_MASTER_NODE_0_PORT_3306_TCP_PROTO=tcp",
"MYSQL_MASTER_NODE_0_PORT_30008_TCP=tcp://10.0.204.173:30008",
"MYSQL_MASTER_NODE_0_PORT_30008_TCP_PROTO=tcp",
"MYSQL_MASTER_NODE_0_SERVICE_HOST=10.0.204.173",
"MYSQL_MASTER_NODE_0_SERVICE_PORT_MYSQL=3306",
"MYSQL_MASTER_NODE_0_PORT_3306_TCP=tcp://10.0.204.173:3306",
"MYSQL_MASTER_NODE_0_PORT_3306_TCP_PORT=3306",
"MYSQL_MASTER_NODE_0_PORT_3306_TCP_ADDR=10.0.204.173",
"MYSQL_MASTER_NODE_0_PORT_30008_TCP_PORT=30008",
"MYSQL_MASTER_NODE_0_SERVICE_PORT_NCAT=30008",
"MYSQL_MASTER_NODE_0_PORT_30008_TCP_ADDR=10.0.204.173",
"MYSQL_MASTER_NODE_0_SERVICE_PORT=3306",
"KUBERNETES_SERVICE_PORT_HTTPS=443",
"MYSQL_MASTER_NODE_0_PORT=tcp://10.0.204.173:3306",
"KUBERNETES_SERVICE_PORT=443",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"GOSU_VERSION=1.7",
"MYSQL_MAJOR=8.0",
"MYSQL_VERSION=8.0.14-1debian9"
],
"Cmd": [
"mysqld"
],
"Healthcheck": {
"Test": [
"NONE"
]
},
"ArgsEscaped": true,
"Image": "sha256:71b5c7e10f9b20f4bd37c348872899cac828b1d2edad269fc8b93c9d43682241",
"Volumes": {
"/var/lib/mysql": {}
},
"WorkingDir": "",
"Entrypoint": [
"docker-entrypoint.sh"
],
"OnBuild": null,
"Labels": {
"annotation.io.kubernetes.container.hash": "d5e52a59",
"annotation.io.kubernetes.container.ports": "[{\"name\":\"mysql\",\"containerPort\":3306,\"protocol\":\"TCP\"}]",
"annotation.io.kubernetes.container.restartCount": "3",
"annotation.io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
"annotation.io.kubernetes.container.terminationMessagePolicy": "File",
"annotation.io.kubernetes.pod.terminationGracePeriod": "30",
"io.kubernetes.container.logpath": "/var/log/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/mysql/3.log",
"io.kubernetes.container.name": "mysql",
"io.kubernetes.docker.type": "container",
"io.kubernetes.pod.name": "mysql-master-0",
"io.kubernetes.pod.namespace": "mysql-masters",
"io.kubernetes.pod.uid": "ce93a7a5-6543-11e9-b093-062e6bba92a2",
"io.kubernetes.sandbox.id": "d4c359fa3c05b26eed1259288f2073ae98c7dee0d050fb141d5c6f40b2000c09"
}
},
"NetworkSettings": {
"Bridge": "",
"SandboxID": "",
"HairpinMode": false,
"LinkLocalIPv6Address": "",
"LinkLocalIPv6PrefixLen": 0,
"Ports": {},
"SandboxKey": "",
"SecondaryIPAddresses": null,
"SecondaryIPv6Addresses": null,
"EndpointID": "",
"Gateway": "",
"GlobalIPv6Address": "",
"GlobalIPv6PrefixLen": 0,
"IPAddress": "",
"IPPrefixLen": 0,
"IPv6Gateway": "",
"MacAddress": "",
"Networks": {}
}
}
]
所有节点(主节点,中间主节点和从节点)的MySQL配置如下:
[mysqld]
report_host=master-00.masters.eastus.mysql.shared.bestday.net
server_id=1000
read_only=OFF
default_authentication_plugin=mysql_native_password
innodb_random_read_ahead=ON
log_bin=binlog
relay_log=relay-bin
log_slave_updates=true
skip_slave_start=true
skip_name_resolve=true
binlog_format=ROW
gtid_mode=ON
enforce_gtid_consistency=true
transaction_write_set_extraction=XXHASH64
binlog_transaction_dependency_tracking=WRITESET
binlog_group_commit_sync_delay=10000
binlog_group_commit_sync_no_delay_count=1000
slave_parallel_workers=16
slave_parallel_type=LOGICAL_CLOCK
slave_preserve_commit_order=ON
master_info_repository=TABLE
max_connections=100000
max_connect_errors=4294967295
innodb_buffer_pool_size=64G
innodb_buffer_pool_instances=16
innodb_log_file_size=1G
innodb_log_buffer_size=64M
innodb_read_io_threads=16
innodb_write_io_threads=16
innodb_flush_method=O_DIRECT
temptable_max_ram=26G
max_heap_table_size=512M
tmp_table_size=512M
join_buffer_size=1M
character_set_server=utf8mb4
collation_server=utf8mb4_general_ci
对容器的请求和限制设置如下:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 16
memory: 112Gi
由于Pod具有严格的反亲和性设置,因此请求设置得较低,因此每个节点都放置一个MySQL实例。我怀疑这可能是个问题(请参阅https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/resource-qos.md),但是话又说回来,容器不是OOMKilled
。
同样,主机和从机具有完全相同的配置,并且从机不会死。
我的猜测是,当节点为从服务器提供服务时,我正在使用的某些设置会起作用,而这些服务与docker / vm内核设置结合在一起会导致mysqld
进程崩溃。
如果有人可以帮助我跟踪这种奇怪的行为,我将不胜感激。
谢谢!
答案 0 :(得分:0)
这种情况发生时,您必须检查“ kubectl获取事件”。您的Pod使用Burstable QoS类型,因此可以响应资源压力而将其驱逐。驱逐与OOMKill不同。通常,在“ kubectl get pod”输出中显示已驱逐的吊舱,但仅从某些版本的Kubernetes开始。
您应该做的是使Mysql pods保证通过,这意味着使请求和限制相等。诸如Mysql之类的重要服务完全不能以Burstable或Best Effort类型运行。
答案 1 :(得分:0)
希望您已经解决了此问题。偶然地,我研究了此错误,结果发现它是容器化的错误,并且自容器化1.2.6(docker 19.03.1)起已得到修复。
您的日志中的关键信息:
level=info msg="shim reaped"
level=warning msg="cleaning up after killed shim"
参考: