我正在vSphere上运行k8s集群,并尝试在其上启动etcd,mongodb和mysql组件。
运行kubeadm init
时,遇到了etcd容器无法启动的问题,导致初始化无法完成。我注意到resolv.conf在search
下列出了四个不同的名称,并注释掉search
这行完全启动了etcd容器,并在集群上启动并运行了kubernetes。
resolv.conf在每个节点上如下所示:
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by
resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE
OVERWRITTEN
nameserver 127.0.1.1
#search den.solidfire.net one.den.solidfire.net ten.den.solidfire.net
solidfire.net
因此没有搜索行。当尝试启动依赖于etcd,mysql和Rabbitmq Pod的应用程序启动并运行时,它们全部三个都遇到了问题,而在Azure和AWS等云提供商上运行得很好。
Mysql出现以下错误:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 52m (x3 over 52m) default-scheduler pod has unbound PersistentVolumeClaims (repeated 3 times)
Normal Scheduled 52m default-scheduler Successfully assigned mysql-0 to sde-slave-test3
Normal SuccessfulAttachVolume 52m attachdetach-controller AttachVolume.Attach succeeded for volume "default-datadir-mysql-0-25e1f"
Normal SuccessfulMountVolume 52m kubelet, sde-slave-test3 MountVolume.SetUp succeeded for volume "config-emptydir"
Normal SuccessfulMountVolume 52m kubelet, sde-slave-test3 MountVolume.SetUp succeeded for volume "config"
Normal SuccessfulMountVolume 52m kubelet, sde-slave-test3 MountVolume.SetUp succeeded for volume "default-token-x2fsd"
Normal SuccessfulMountVolume 52m kubelet, sde-slave-test3 MountVolume.SetUp succeeded for volume "default-datadir-mysql-0-25e1f"
Normal Started 52m kubelet, sde-slave-test3 Started container
Normal Pulled 52m kubelet, sde-slave-test3 Container image "registry.qstack.com/qstack/mariadb-cluster:10.3.1" already present on machine
Normal Created 52m kubelet, sde-slave-test3 Created container
Normal Pulled 52m kubelet, sde-slave-test3 Container image "registry.qstack.com/qstack/mysqld-exporter:1.1" already present on machine
Normal Created 52m kubelet, sde-slave-test3 Created container
Normal Started 52m kubelet, sde-slave-test3 Started container
Warning Unhealthy 51m (x2 over 51m) kubelet, sde-slave-test3 Liveness probe failed: Get http://10.42.0.9:9104/metrics: dial tcp 10.42.0.9:9104: getsockopt: connection refused
Warning Unhealthy 51m (x3 over 52m) kubelet, sde-slave-test3 Readiness probe failed: Get http://10.42.0.9:9104/metrics: dial tcp 10.42.0.9:9104: getsockopt: connection refused
Warning Unhealthy 51m kubelet, sde-slave-test3 Liveness probe failed: Get http://10.42.0.9:9104/metrics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 51m (x3 over 51m) kubelet, sde-slave-test3 Readiness probe failed: Get http://10.42.0.9:9104/metrics: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 7m (x267 over 51m) kubelet, sde-slave-test3 Readiness probe failed: ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)
Warning BackOff 2m (x148 over 45m) kubelet, sde-slave-test3 Back-off restarting failed container
Rabbitmq在进入crashloopbackoff之前注销以下内容:
2018-08-08 13:59:02.268 [info] <0.198.0> Will try to lock with peer discovery backend rabbit_peer_discovery_k8s
2018-08-08 13:59:02.268 [info] <0.198.0> Peer discovery backend rabbit_peer_discovery_k8s does not support registration, skipping randomized startup delay.
2018-08-08 13:59:07.275 [info] <0.198.0> Failed to get nodes from k8s - {failed_connect,[{to_address,{"kubernetes.default.svc.cluster.local",443}},
{inet,[inet],nxdomain}]}
2018-08-08 13:59:07.276 [error] <0.197.0> CRASH REPORT Process <0.197.0> with 0 neighbours exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 163 in application_master:init/4 line 134
2018-08-08 13:59:07.276 [info] <0.33.0> Application rabbit exited with reason: no case clause matching {error,"{failed_connect,[{to_address,{\"kubernetes.default.svc.cluster.local\",443}},\n {inet,[inet],nxdomain}]}"} in rabbit_mnesia:init_from_config/0 line 163
{"Kernel pid terminated",application_controller,"{application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,\"{failed_connect,[{to_address,{\\"kubernetes.default.svc.cluster.local\\",443}},\n {inet,[inet],nxdomain}]}\"}},[{rabbit_mnesia,init_from_config,0,[{file,\"src/rabbit_mnesia.erl\"},{line,163}]},{rabbit_mnesia,init_with_lock,3,[{file,\"src/rabbit_mnesia.erl\"},{line,143}]},{rabbit_mnesia,init,0,[{file,\"src/rabbit_mnesia.erl\"},{line,111}]},{rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,run_step,2,[{file,\"src/rabbit_boot_steps.erl\"},{line,49}]},{rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit_boot_steps,run_boot_steps,1,[{file,\"src/rabbit_boot_steps.erl\"},{line,26}]},{rabbit,start,2,[{file,\"src/rabbit.erl\"},{line,792}]}]}}}}}"}
Kernel pid terminated (application_controller) ({application_start_failure,rabbit,{bad_return,{{rabbit,start,[normal,[]]},{'EXIT',{{case_clause,{error,"{failed_connect,[{to_address,{\"kubernetes.defau
Crash dump is being written to: /var/log/rabbitmq/erl_crash.dump...done
而etcd无休止地挂在:
Waiting for etcd-0.etcd to come up
ping: bad address 'etcd-0.etcd'
Waiting for etcd-0.etcd to come up
ping: bad address 'etcd-0.etcd'
Waiting for etcd-0.etcd to come up
ping: bad address 'etcd-0.etcd'
Waiting for etcd-0.etcd to come up
现在,我认为这可能与resolv.conf已被篡改的事实有关。有人有经验吗?
如有必要,我可以发布更多日志,也可以为每个组件发布k8s规范。