Kubernetes(AKS)上的MySQL主节点无故死亡

时间:2019-05-02 15:59:49

标签: mysql docker kubernetes azure-aks

编辑:我使用的是MySQL版本8.0.14,并使用了官方的MySQL docker镜像。

我有一个在AKS之上运行的主从MySQL拓扑。运行一段时间后,主容器和中间容器将消失(退出代码137 == SIGKILL)。从站很好,所有节点(主节点,中间主节点和从节点)具有完全相同的配置和VM容量(Standard DS14 v2 - 16 vcpus, 112 GB memory)。没有迹象表明该容器为OOMKilled,并且整体内存使用率从未超过60%。

kubectl describe pod xxx仅显示容器已重新启动,因为容器以错误代码137和错误类型Error退出。

MySQL日志中未显示任何错误,而我在节点的内核日志中看到的唯一错误如下:

May 01 14:40:23 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:23.033211822Z" level=info msg="shim reaped" id=48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a
May 01 14:40:23 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:23.033259823Z" level=warning msg="cleaning up after killed shim" id=48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a namespace=moby
May 01 14:40:23 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:23.231095965Z" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 01 14:40:23 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:23.231730670Z" level=warning msg="48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a cleanup: failed to unmount IPC: umount /var/lib/docker/containers/48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a/mounts/shm, flags: 0x2: no such file or directory"
May 01 14:40:23 aks-myeuscluster-14926762-0 kubelet[1481]: I0501 14:40:23.973507    1481 kubelet.go:1928] SyncLoop (PLEG): "mysql-master-0_mysql-masters(ce93a7a5-6543-11e9-b093-062e6bba92a2)", event: &pleg.PodLifecycleEvent{ID:"ce93a7a5-6543-11e9-b093-062e6bba92a2", Type:"ContainerDied", Data:"48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a"}
May 01 14:40:23 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:23.986599174Z" level=warning msg="Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap."
May 01 14:40:24 aks-myeuscluster-14926762-0 dockerd[1268]: time="2019-05-01T14:40:24.246558869Z" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/b2d111216341c80b7b5d5aba9f3d81bb81923da583b2c57d99803831575b0596/shim.sock" debug=false pid=23364
May 01 14:40:25 aks-myeuscluster-14926762-0 kubelet[1481]: I0501 14:40:25.007553    1481 kubelet.go:1928] SyncLoop (PLEG): "mysql-master-0_mysql-masters(ce93a7a5-6543-11e9-b093-062e6bba92a2)", event: &pleg.PodLifecycleEvent{ID:"ce93a7a5-6543-11e9-b093-062e6bba92a2", Type:"ContainerStarted", Data:"b2d111216341c80b7b5d5aba9f3d81bb81923da583b2c57d99803831575b0596"}

已终止容器的节点上的docker inspect显示:

[
    {
        "Id": "48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a",
        "Created": "2019-05-01T04:07:23.816102331Z",
        "Path": "docker-entrypoint.sh",
        "Args": [
            "mysqld"
        ],
        "State": {
            "Status": "exited",
            "Running": false,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 0,
            "ExitCode": 137,
            "Error": "",
            "StartedAt": "2019-05-01T04:07:24.192620731Z",
            "FinishedAt": "2019-05-01T14:40:23.230693162Z"
        },
        "Image": "sha256:71b5c7e10f9b20f4bd37c348872899cac828b1d2edad269fc8b93c9d43682241",
        "ResolvConfPath": "/var/lib/docker/containers/d4c359fa3c05b26eed1259288f2073ae98c7dee0d050fb141d5c6f40b2000c09/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/d4c359fa3c05b26eed1259288f2073ae98c7dee0d050fb141d5c6f40b2000c09/hostname",
        "HostsPath": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/etc-hosts",
        "LogPath": "/var/lib/docker/containers/48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a/48e8b87334bbb0353e9d04230790e154ad68fa01c7159201ffc50c2eee61067a-json.log",
        "Name": "/k8s_mysql_mysql-master-0_mysql-masters_ce93a7a5-6543-11e9-b093-062e6bba92a2_3",
        "RestartCount": 0,
        "Driver": "overlay2",
        "Platform": "linux",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "docker-default",
        "ExecIDs": null,
        "HostConfig": {
            "Binds": [
                "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~empty-dir/data:/var/lib/mysql",
                "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~empty-dir/conf:/etc/mysql/conf.d",
                "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~secret/default-token-96h8m:/var/run/secrets/kubernetes.io/serviceaccount:ro",
                "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/etc-hosts:/etc/hosts",
                "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/containers/mysql/65c11b38:/dev/termination-log"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "json-file",
                "Config": {
                    "max-file": "5",
                    "max-size": "50m"
                }
            },
            "NetworkMode": "container:d4c359fa3c05b26eed1259288f2073ae98c7dee0d050fb141d5c6f40b2000c09",
            "PortBindings": null,
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "container:d4c359fa3c05b26eed1259288f2073ae98c7dee0d050fb141d5c6f40b2000c09",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 999,
            "PidMode": "",
            "Privileged": false,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
                "seccomp=unconfined"
            ],
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                0,
                0
            ],
            "Isolation": "",
            "CpuShares": 102,
            "Memory": 120259084288,
            "NanoCpus": 0,
            "CgroupParent": "/kubepods/burstable/podce93a7a5-6543-11e9-b093-062e6bba92a2",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 100000,
            "CpuQuota": 1600000,
            "CpuRealtimePeriod": 0,
            "CpuRealtimeRuntime": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": [],
            "DeviceCgroupRules": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "KernelMemoryTCP": 0,
            "MemoryReservation": 0,
            "MemorySwap": -1,
            "MemorySwappiness": null,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0,
            "MaskedPaths": [
                "/proc/acpi",
                "/proc/kcore",
                "/proc/keys",
                "/proc/latency_stats",
                "/proc/timer_list",
                "/proc/timer_stats",
                "/proc/sched_debug",
                "/proc/scsi",
                "/sys/firmware"
            ],
            "ReadonlyPaths": [
                "/proc/asound",
                "/proc/bus",
                "/proc/fs",
                "/proc/irq",
                "/proc/sys",
                "/proc/sysrq-trigger"
            ]
        },
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/e881867313c0c44e1c344f051c1ed2bdbb2d7060987503ee654bdbb52298e22f-init/diff:/var/lib/docker/overlay2/02dbd10b6ce555ab3e7adfa20c38a11e6efa59ca2a48630f9d9511982c6ad679/diff:/var/lib/docker/overlay2/f90a222989df3b42aecf7bed9005b1126a73b8b0fd4842bd5c7ecdd7f9f011e6/diff:/var/lib/docker/overlay2/08d3f18230cdb2f65212d1a30e7f6b06b5720778dbb1c3d4a672fc4d804f7704/diff:/var/lib/docker/overlay2/aed1b032114dd084eae7b4d4427a3805f96427be0dce0f8b729840f1f230fd91/diff:/var/lib/docker/overlay2/966072e7d9ffe343e1f00b87c8cb8399f05084052cced72644123b19269aa80e/diff:/var/lib/docker/overlay2/154369c09c2853f5d77d05ba917c7c7f5cdb071f5c328ac403b30c7b4ab98a62/diff:/var/lib/docker/overlay2/067d055c87bf0ccb8e151cd8a2ac7c53da7215f4e5bb99e72f56d321b9f236f5/diff:/var/lib/docker/overlay2/22ddb21cd2fdfffc9e3757f89d866b94e471dd5baf81ef2481742a55ca04f3d3/diff:/var/lib/docker/overlay2/4523ae9e068b9dd7d0d6fb9e280cadb1c17e7101f7f280bfd9d4b9b54ca0616d/diff:/var/lib/docker/overlay2/32e5c4244779f2008e9f020f97afd80809c3549f956432e7116e446a59bc079d/diff:/var/lib/docker/overlay2/bfe5c64218049c0c90d4eb2f9bf2356a83e8c44802df0e48d6f55204efaac6a6/diff:/var/lib/docker/overlay2/a0c23bd39e1c83e67f3af090af495de289297d5332d8f7766ee197e0d4ca6a0c/diff",
                "MergedDir": "/var/lib/docker/overlay2/e881867313c0c44e1c344f051c1ed2bdbb2d7060987503ee654bdbb52298e22f/merged",
                "UpperDir": "/var/lib/docker/overlay2/e881867313c0c44e1c344f051c1ed2bdbb2d7060987503ee654bdbb52298e22f/diff",
                "WorkDir": "/var/lib/docker/overlay2/e881867313c0c44e1c344f051c1ed2bdbb2d7060987503ee654bdbb52298e22f/work"
            },
            "Name": "overlay2"
        },
        "Mounts": [
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/containers/mysql/65c11b38",
                "Destination": "/dev/termination-log",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~empty-dir/data",
                "Destination": "/var/lib/mysql",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~empty-dir/conf",
                "Destination": "/etc/mysql/conf.d",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/volumes/kubernetes.io~secret/default-token-96h8m",
                "Destination": "/var/run/secrets/kubernetes.io/serviceaccount",
                "Mode": "ro",
                "RW": false,
                "Propagation": "rprivate"
            },
            {
                "Type": "bind",
                "Source": "/var/lib/kubelet/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/etc-hosts",
                "Destination": "/etc/hosts",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "mysql-master-0",
            "Domainname": "",
            "User": "0",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "ExposedPorts": {
                "3306/tcp": {},
                "33060/tcp": {}
            },
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "KUBERNETES_PORT_443_TCP_ADDR=mysql-east-mysql-eastus-bdfb38-0e053371.hcp.eastus.azmk8s.io",
                "KUBERNETES_PORT=tcp://mysql-east-mysql-eastus-bdfb38-0e053371.hcp.eastus.azmk8s.io:443",
                "KUBERNETES_PORT_443_TCP=tcp://mysql-east-mysql-eastus-bdfb38-0e053371.hcp.eastus.azmk8s.io:443",
                "KUBERNETES_SERVICE_HOST=mysql-east-mysql-eastus-bdfb38-0e053371.hcp.eastus.azmk8s.io",
                "MYSQL_ROOT_PASSWORD=YpBwWhZyUnQ8ERR8",
                "KUBERNETES_PORT_443_TCP_PROTO=tcp",
                "KUBERNETES_PORT_443_TCP_PORT=443",
                "MYSQL_MASTER_NODE_0_PORT_3306_TCP_PROTO=tcp",
                "MYSQL_MASTER_NODE_0_PORT_30008_TCP=tcp://10.0.204.173:30008",
                "MYSQL_MASTER_NODE_0_PORT_30008_TCP_PROTO=tcp",
                "MYSQL_MASTER_NODE_0_SERVICE_HOST=10.0.204.173",
                "MYSQL_MASTER_NODE_0_SERVICE_PORT_MYSQL=3306",
                "MYSQL_MASTER_NODE_0_PORT_3306_TCP=tcp://10.0.204.173:3306",
                "MYSQL_MASTER_NODE_0_PORT_3306_TCP_PORT=3306",
                "MYSQL_MASTER_NODE_0_PORT_3306_TCP_ADDR=10.0.204.173",
                "MYSQL_MASTER_NODE_0_PORT_30008_TCP_PORT=30008",
                "MYSQL_MASTER_NODE_0_SERVICE_PORT_NCAT=30008",
                "MYSQL_MASTER_NODE_0_PORT_30008_TCP_ADDR=10.0.204.173",
                "MYSQL_MASTER_NODE_0_SERVICE_PORT=3306",
                "KUBERNETES_SERVICE_PORT_HTTPS=443",
                "MYSQL_MASTER_NODE_0_PORT=tcp://10.0.204.173:3306",
                "KUBERNETES_SERVICE_PORT=443",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "GOSU_VERSION=1.7",
                "MYSQL_MAJOR=8.0",
                "MYSQL_VERSION=8.0.14-1debian9"
            ],
            "Cmd": [
                "mysqld"
            ],
            "Healthcheck": {
                "Test": [
                    "NONE"
                ]
            },
            "ArgsEscaped": true,
            "Image": "sha256:71b5c7e10f9b20f4bd37c348872899cac828b1d2edad269fc8b93c9d43682241",
            "Volumes": {
                "/var/lib/mysql": {}
            },
            "WorkingDir": "",
            "Entrypoint": [
                "docker-entrypoint.sh"
            ],
            "OnBuild": null,
            "Labels": {
                "annotation.io.kubernetes.container.hash": "d5e52a59",
                "annotation.io.kubernetes.container.ports": "[{\"name\":\"mysql\",\"containerPort\":3306,\"protocol\":\"TCP\"}]",
                "annotation.io.kubernetes.container.restartCount": "3",
                "annotation.io.kubernetes.container.terminationMessagePath": "/dev/termination-log",
                "annotation.io.kubernetes.container.terminationMessagePolicy": "File",
                "annotation.io.kubernetes.pod.terminationGracePeriod": "30",
                "io.kubernetes.container.logpath": "/var/log/pods/ce93a7a5-6543-11e9-b093-062e6bba92a2/mysql/3.log",
                "io.kubernetes.container.name": "mysql",
                "io.kubernetes.docker.type": "container",
                "io.kubernetes.pod.name": "mysql-master-0",
                "io.kubernetes.pod.namespace": "mysql-masters",
                "io.kubernetes.pod.uid": "ce93a7a5-6543-11e9-b093-062e6bba92a2",
                "io.kubernetes.sandbox.id": "d4c359fa3c05b26eed1259288f2073ae98c7dee0d050fb141d5c6f40b2000c09"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {}
        }
    }
]

所有节点(主节点,中间主节点和从节点)的MySQL配置如下:

[mysqld]
report_host=master-00.masters.eastus.mysql.shared.bestday.net
server_id=1000
read_only=OFF
default_authentication_plugin=mysql_native_password
innodb_random_read_ahead=ON
log_bin=binlog
relay_log=relay-bin
log_slave_updates=true
skip_slave_start=true
skip_name_resolve=true
binlog_format=ROW
gtid_mode=ON
enforce_gtid_consistency=true
transaction_write_set_extraction=XXHASH64
binlog_transaction_dependency_tracking=WRITESET
binlog_group_commit_sync_delay=10000
binlog_group_commit_sync_no_delay_count=1000
slave_parallel_workers=16
slave_parallel_type=LOGICAL_CLOCK
slave_preserve_commit_order=ON
master_info_repository=TABLE
max_connections=100000
max_connect_errors=4294967295
innodb_buffer_pool_size=64G
innodb_buffer_pool_instances=16
innodb_log_file_size=1G
innodb_log_buffer_size=64M
innodb_read_io_threads=16
innodb_write_io_threads=16
innodb_flush_method=O_DIRECT
temptable_max_ram=26G
max_heap_table_size=512M
tmp_table_size=512M
join_buffer_size=1M
character_set_server=utf8mb4
collation_server=utf8mb4_general_ci

对容器的请求和限制设置如下:

requests:
  cpu: 100m
  memory: 128Mi
limits:
  cpu: 16
  memory: 112Gi

由于Pod具有严格的反亲和性设置,因此请求设置得较低,因此每个节点都放置一个MySQL实例。我怀疑这可能是个问题(请参阅https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/resource-qos.md),但是话又说回来,容器不是OOMKilled

同样,主机和从机具有完全相同的配置,并且从机不会死。

我的猜测是,当节点为从服务器提供服务时,我正在使用的某些设置会起作用,而这些服务与docker / vm内核设置结合在一起会导致mysqld进程崩溃。

如果有人可以帮助我跟踪这种奇怪的行为,我将不胜感激。

谢谢!

2 个答案:

答案 0 :(得分:0)

这种情况发生时,您必须检查“ kubectl获取事件”。您的Pod使用Burstable QoS类型,因此可以响应资源压力而将其驱逐。驱逐与OOMKill不同。通常,在“ kubectl get pod”输出中显示已驱逐的吊舱,但仅从某些版本的Kubernetes开始。

您应该做的是使Mysql pods保证通过,这意味着使请求和限制相等。诸如Mysql之类的重要服务完全不能以Burstable或Best Effort类型运行。

答案 1 :(得分:0)

希望您已经解决了此问题。偶然地,我研究了此错误,结果发现它是容器化的错误,并且自容器化1.2.6(docker 19.03.1)起已得到修复。

您的日志中的关键信息:

level=info msg="shim reaped"
level=warning msg="cleaning up after killed shim"

参考: