Kubespray:Netchecker连接检查失败

时间:2020-07-01 13:56:27

标签: kubernetes kubespray

我使用Kubespray在OpenStack实例上部署了Kubernetes(v1.17.5)集群。这些实例是在Glance中导入的CentOS 7.6.1811 qcow2映像。

安装成功,并且可以使用kubectl命令看到我的节点和吊舱。

我使用了deploy_netchecker选项来部署NetChecker并测试群集中的网络,并设置了network_plugin="flannel"。 我也尝试过kube_proxy_mode="iptables",但似乎不会影响结果。 这几乎就是我在k8s-cluster.yml文件中所做的所有更改。

所有的pod都在运行,也有服务:

[centos@cl1-master-0 ~]$ kubectl get svc --all-namespaces
NAMESPACE     NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes                  ClusterIP   10.233.0.1      <none>        443/TCP                  46h
default       netchecker-service          NodePort    10.233.13.213   <none>        8081:31081/TCP           46h
kube-system   coredns                     ClusterIP   10.233.0.3      <none>        53/UDP,53/TCP,9153/TCP   46h
kube-system   dashboard-metrics-scraper   ClusterIP   10.233.59.12    <none>        8000/TCP                 46h
kube-system   kubernetes-dashboard        ClusterIP   10.233.63.20    <none>        443/TCP                  46h

但是netchecker API给出了以下答案:

[root@localhost ~]# curl http://X.X.X.X:31081/api/v1/connectivity_check
{"Message":"Connectivity check fails. Reason: there are absent or outdated pods; look up the payload","Absent":["netchecker-agent-hostnet-kk56x","netchecker-agent-hostnet-klldn","netchecker-agent-hostnet-r2vqs","netchecker-agent-hostnet-wqhjs"],"Outdated":["netchecker-agent-4jsgf","netchecker-agent-c9pcf","netchecker-agent-hostnet-jzbfv","netchecker-agent-vxgpf"]}

由于未知原因,我无法从具有本地主机的群集节点访问API,因此我在OpenStack中使用了浮动IP。

以下是来自代理的一些日志:

[centos@cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-vjnwl_d8290268-3ea4-4e3c-acb4-295ab162a735/netchecker-agent/0.log
{"log":"I0701 13:04:01.814246       1 agent.go:135] Response status code: 200\n","stream":"stderr","time":"2020-07-01T13:04:01.81437579Z"}
{"log":"I0701 13:04:01.814272       1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:04:01.814393199Z"}
{"log":"I0701 13:04:16.817398       1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-vjnwl\n","stream":"stderr","time":"2020-07-01T13:04:16.817786735Z"}
[centos@cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-hostnet-klldn_d5fa6e72-885f-44e1-97a6-880a25e6d6d6/netchecker-agent/0.log
{"log":"E0701 13:05:22.804428       1 agent.go:133] Error while sending info. Details: Post http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn: dial tcp 10.233.13.213:8081: i/o timeout\n","stream":"stderr","time":"2020-07-01T13:05:22.805138032Z"}
{"log":"I0701 13:05:22.804474       1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:05:22.805190295Z"}
{"log":"I0701 13:05:37.807140       1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn\n","stream":"stderr","time":"2020-07-01T13:05:37.807309111Z"}

来自服务器的日志不表示任何错误。

我尝试使用以下方法检查DNS解析:

[centos@cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- /bin/sh
/ $ nslookup kubernetes.default
Server:    169.254.25.10
Address 1: 169.254.25.10

nslookup: can't resolve 'kubernetes.default'

[centos@cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- cat /etc/resolv.conf
nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal
options ndots:5

169.254.25.10是nodelocaldns的IP,但似乎无法查询已部署的coredns服务。 当我将nslookup netchecker-service.default.svc.cluster.local 10.233.0.3与coredns IP配合使用时,我会得到正确的答案。

我的配置有什么问题?

预先感谢

更新:插件Flannel有一个issue,并包含一个适用于集群所有节点的修复程序。完成后,吊舱成功地向netchecker服务器报告。

1 个答案:

答案 0 :(得分:0)

更新:插件Flannel有一个issue,并包含一个适用于集群所有节点的修复程序。完成后,吊舱成功地向netchecker服务器报告。